Abstract
Describable face attributes are labels that can be given to a face image to describe its characteristics. Examples of face attributes include gender, age, ethnicity, face shape, and nose size. Predicting face attributes in the wild is challenging due to complex face variations. This chapter aims to provide an in-depth presentation of recent progress and the current state-of-the-art approaches to solving some of the fundamental challenges in face attribute recognition, particularly from the angle of deep learning. We highlight effective techniques for training deep convolutional networks for predicting face attributes in the wild, and addressing the problem of imbalanced distribution of attributes. In addition, we discuss the use of face attributes as rich contexts to facilitate accurate face detection and face alignment in return. The chapter ends by posing an open question for the face attribute recognition challenge arising from emerging and future applications .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The method is also applicable to other visual recognition problems that encounter imbalanced class distributions.
- 2.
An imposter of a data point \(x_i\) is another data point \(x_j\) with a different class label, \(y_i \ne y_j\).
- 3.
- 4.
- 5.
The layers of a CNN have neurons arranged in 3 dimensions: width, height, and the third dimension of an activation volume.
- 6.
IoU indicates Intersection over Union.
- 7.
Data and codes of this work are available at http://mmlab.ie.cuhk.edu.hk/projects/TCDCN.html.
References
Arbeláez, P., Pont-Tuset, J., Barron, J.T., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting with constrained local models. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2930–2940 (2013)
Berg, T., Belhumeur, P.N.: Poof: Part-based one-versus-one features for fine-grained categorization, face verification, and attribute estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Bourdev, L., Maji, S., Malik, J.: Describing people: a poselet-based approach to attribute classification. In: International Conference on Computer Vision (ICCV) (2011)
Boureau, Y.L., Roux, N.L., Bach, F., Ponce, J., LeCun, Y.: Ask the locals: multi-way local pooling for image recognition. In: International Conference on Computer Vision (ICCV) (2011)
Burgos-Artizzu, X., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: International Conference on Computer Vision (ICCV) (2013)
Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Chechik, G., Shalit, U., Sharma, V., Bengio, S.: An online algorithm for large scale image similarity learning. In: Conference on Neural Information Processing Systems (NIPS) (2009)
Chen, D., Ren, S., Wei, Y., Cao, X., Sun, J.: Joint cascade face detection and alignment. In: European Conference on Computer Vision (ECCV) (2014)
Chen, K., Gong, S., Xiang, T., Loy, C.C.: Cumulative attribute space for age and crowd density estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Chrysos, G.G., Antonakos, E., Snape, P., Asthana, A., Zafeiriou, S.: A comprehensive performance evaluation of deformable face tracking “in-the-wild”. arXiv preprint arXiv:1603.06015 (2016)
Chung, J., Lee, D., Seo, Y., Yoo, C.D.: Deep attribute networks. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2012)
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Cootes, T.F., Ionita, M.C., Lindner, C., Sauer, P.: Robust and accurate shape model fitting using random forest regression voting. In: European Conference on Computer Vision (ECCV) (2012)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: European Conference on Computer Vision (ECCV) (2014)
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Face++: http://www.faceplusplus.com/
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 915–1929 (2013)
Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision (ICCV) (2015)
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2006)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
Hu, Y., Lam, K.M., Qiu, G., Shen, T.: From local pixel structure to global image super-resolution: a new face hallucination framework. IEEE Trans. Image Process. 20(2), 433–445 (2011)
Huang, C., Loy, C.C., Tang, X.: Discriminative sparse neighbor approximation for imbalanced learning. arXiv preprint arXiv:1602.01197 (2016)
Huang, C., Loy, C.C., Tang, X.: Learning deep representation for imbalanced classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Huang, H., He, H., Fan, X., Zhang, J.: Super-resolution of human face image using canonical correlation analysis. Pattern Recogn. 43(7), 2532–2543 (2010)
Huang, Z., Zhao, X., Shan, S., Wang, R., Chen, X.: Coupling alignments with recognition for still-to-video face recognition. In: International Conference on Computer Vision (ICCV), pp. 3296–3303 (2013)
Jain, V., Learned-Miller, E.: FDDB: a benchmark for face detection in unconstrained settings. university of massachusetts. Technical report, Amherst, Tech. Rep. UM-CS-2010-009 (2010)
Jain, V., Learned-Miller, E.: Online domain adaptation of a pre-trained cascade of classifiers. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2011)
Jin, Y., Bouganis, C.S.: Robust multi-image based blind face hallucination. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Kazemi, V., Josephine, S.: One millisecond face alignment with an ensemble of regression trees. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (2011)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Conference on Neural Information Processing Systems (NIPS) (2012)
Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: International Conference on Computer Vision (ICCV) (2009)
Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Describable visual attributes for face verification and image search. IEEE Trans. Pattern Anal. Mach. Intell. 33(10), 1962–1977 (2011)
Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature localization. In: European Conference on Computer Vision (ECCV) (2012)
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Conference on Neural Information Processing Systems (NIPS) (1990)
Li, J., Zhang, Y.: Learning SURF cascade for fast and accurate object detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Li, H., Hua, G., Lin, Z., Brandt, J., Yang, J.: Probabilistic elastic part model for unsupervised face detector adaptation. In: International Conference on Computer Vision (ICCV) (2013)
Li, H., Lin, Z., Brandt, J., Shen, X., Hua, G.: Efficient boosted exemplar-based face detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Liu, Z., Li, X., Luo, P., Loy, C.C., Tang, X.: Semantic image segmentation via deep parsing network. In: International Conference on Computer Vision (ICCV) (2015)
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: International Conference on Computer Vision (ICCV) (2015)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Lu, C., Tang, X.: Surpassing human-level face verification performance on LFW with gaussianface. arXiv preprint arXiv:1404.3840 (2014)
Luo, P., Wang, X., Tang, X.: A deep sum-product architecture for robust facial attributes analysis. In: International Conference on Computer Vision (ICCV) (2013)
Mathias, M., Benenson, R., Pedersoli, M., Van Gool, L.: Face detection without bells and whistles. In: European Conference on Computer Vision (ECCV) (2014)
McCullagh, P., Nelder, J.A., McCullagh, P.: Generalized linear models. Chapman and Hall London (1989)
Mnih, V., Hinton, G.: Learning to label aerial images from noisy data. In: International Conference on Machine Learning (ICML) (2012)
Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 fps via regressing local binary features. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Conference on Neural Information Processing Systems (NIPS) (2015)
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: International Conference on Computer Vision Workshop (2013)
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Shen, X., Lin, Z., Brandt, J., Wu, Y.: Detecting and aligning faces by image retrieval. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: Conference on Neural Information Processing Systems (NIPS) (2014)
Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Tzimiropoulos, G., Pantic, M.: Gauss-newton deformable part models for face alignment in-the-wild. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Uijlings, J.R., van de Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Wang, N., Tao, D., Gao, X., Li, X., Li, J.: A comprehensive survey to face hallucination. Int. J. Comput. Vis. 106(1), 9–30 (2014)
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to zoo. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
Xiong, X., Torre, F.: Supervised descent method and its applications to face alignment. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Yan, J., Lei, Z., Wen, L., Li, S.: The fastest deformable part model for object detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Yan, J., Zhang, X., Lei, Z., Li, S.Z.: Face detection by structural models. Image Vis. Comput. 32(10), 790–799 (2014)
Yang, B., Yan, J., Lei, Z., Li, S.Z.: Aggregate channel features for multi-view face detection. In: International Joint Conference on Biometrics (IJCB) (2014)
Yang, C.Y., Liu, S., Yang, M.H.: Structured face hallucination. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Yang, H., Patras, I.: Sieving regression forest votes for facial feature detection in the wild. In: International Conference on Computer Vision (ICCV) (2013)
Yang, H., Jia, X., Loy, C.C., Robinson, P.: An empirical study of recent face alignment methods. arXiv preprint arXiv:1511.05049 (2015)
Yang, S., Luo, P., Loy, C.C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: International Conference on Computer Vision (ICCV) (2015)
Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Yu, X., Huang, J., Zhang, S., Yan, W., Metaxas, D.N.: Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In: International Conference on Computer Vision (ICCV) (2013)
Zhang, J., Shan, S., Kan, M., Chen, X.: Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In: European Conference on Computer Vision (ECCV) (2014)
Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.: PANDA: pose aligned networks for deep attribute modeling. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: European Conference on Computer Vision (ECCV) (2014)
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning deep representation for face alignment with auxiliary attributes. IEEE Trans. Pattern Anal. Mach. Intell. 38(5), 918–930 (2015)
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning social relation traits from face images. In: International Conference on Computer Vision (ICCV) (2015)
Zhu, S., Li, C., Loy, C.C., Tang, X.: Face alignment by coarse-to-fine shape searching. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Zhu, S., Li, C., Loy, C.C., Tang, X.: Unconstrained face alignment via cascaded compositional learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: European Conference on Computer Vision (ECCV) (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Loy, C.C., Luo, P., Huang, C. (2017). Deep Learning Face Attributes for Detection and Alignment. In: Feris, R., Lampert, C., Parikh, D. (eds) Visual Attributes. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-50077-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-50077-5_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50075-1
Online ISBN: 978-3-319-50077-5
eBook Packages: Computer ScienceComputer Science (R0)