Deep Learning Face Attributes for Detection and Alignment

Loy, Chen Change; Luo, Ping; Huang, Chen

doi:10.1007/978-3-319-50077-5_8

Chen Change Loy⁵,
Ping Luo⁵ &
Chen Huang⁶

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1700 Accesses
1 Citations

Abstract

Describable face attributes are labels that can be given to a face image to describe its characteristics. Examples of face attributes include gender, age, ethnicity, face shape, and nose size. Predicting face attributes in the wild is challenging due to complex face variations. This chapter aims to provide an in-depth presentation of recent progress and the current state-of-the-art approaches to solving some of the fundamental challenges in face attribute recognition, particularly from the angle of deep learning. We highlight effective techniques for training deep convolutional networks for predicting face attributes in the wild, and addressing the problem of imbalanced distribution of attributes. In addition, we discuss the use of face attributes as rich contexts to facilitate accurate face detection and face alignment in return. The chapter ends by posing an open question for the face attribute recognition challenge arising from emerging and future applications .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The method is also applicable to other visual recognition problems that encounter imbalanced class distributions.
2.
An imposter of a data point \(x_i\) is another data point \(x_j\) with a different class label, \(y_i \ne y_j\).
3.
Employing clustering to aid classification is common in the literature [6, 65].
4.
http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html.
5.
The layers of a CNN have neurons arranged in 3 dimensions: width, height, and the third dimension of an activation volume.
6.
IoU indicates Intersection over Union.
7.
Data and codes of this work are available at http://mmlab.ie.cuhk.edu.hk/projects/TCDCN.html.

References

Arbeláez, P., Pont-Tuset, J., Barron, J.T., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting with constrained local models. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2930–2940 (2013)
Article Google Scholar
Berg, T., Belhumeur, P.N.: Poof: Part-based one-versus-one features for fine-grained categorization, face verification, and attribute estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Bourdev, L., Maji, S., Malik, J.: Describing people: a poselet-based approach to attribute classification. In: International Conference on Computer Vision (ICCV) (2011)
Google Scholar
Boureau, Y.L., Roux, N.L., Bach, F., Ponce, J., LeCun, Y.: Ask the locals: multi-way local pooling for image recognition. In: International Conference on Computer Vision (ICCV) (2011)
Google Scholar
Burgos-Artizzu, X., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: International Conference on Computer Vision (ICCV) (2013)
Google Scholar
Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Chechik, G., Shalit, U., Sharma, V., Bengio, S.: An online algorithm for large scale image similarity learning. In: Conference on Neural Information Processing Systems (NIPS) (2009)
Google Scholar
Chen, D., Ren, S., Wei, Y., Cao, X., Sun, J.: Joint cascade face detection and alignment. In: European Conference on Computer Vision (ECCV) (2014)
Google Scholar
Chen, K., Gong, S., Xiang, T., Loy, C.C.: Cumulative attribute space for age and crowd density estimation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Chrysos, G.G., Antonakos, E., Snape, P., Asthana, A., Zafeiriou, S.: A comprehensive performance evaluation of deformable face tracking “in-the-wild”. arXiv preprint arXiv:1603.06015 (2016)
Chung, J., Lee, D., Seo, Y., Yoo, C.D.: Deep attribute networks. In: NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2012)
Google Scholar
Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. IEEE Trans. Pattern Anal. Mach. Intell. 23(6), 681–685 (2001)
Article Google Scholar
Cootes, T.F., Ionita, M.C., Lindner, C., Sauer, P.: Robust and accurate shape model fitting using random forest regression voting. In: European Conference on Computer Vision (ECCV) (2012)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: European Conference on Computer Vision (ECCV) (2014)
Google Scholar
Dong, C., Loy, C.C., He, K., Tang, X.: Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2015)
Article Google Scholar
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Face++: http://www.faceplusplus.com/
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 915–1929 (2013)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2006)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
Hu, Y., Lam, K.M., Qiu, G., Shen, T.: From local pixel structure to global image super-resolution: a new face hallucination framework. IEEE Trans. Image Process. 20(2), 433–445 (2011)
Article MathSciNet Google Scholar
Huang, C., Loy, C.C., Tang, X.: Discriminative sparse neighbor approximation for imbalanced learning. arXiv preprint arXiv:1602.01197 (2016)
Huang, C., Loy, C.C., Tang, X.: Learning deep representation for imbalanced classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Huang, H., He, H., Fan, X., Zhang, J.: Super-resolution of human face image using canonical correlation analysis. Pattern Recogn. 43(7), 2532–2543 (2010)
Article MATH Google Scholar
Huang, Z., Zhao, X., Shan, S., Wang, R., Chen, X.: Coupling alignments with recognition for still-to-video face recognition. In: International Conference on Computer Vision (ICCV), pp. 3296–3303 (2013)
Google Scholar
Jain, V., Learned-Miller, E.: FDDB: a benchmark for face detection in unconstrained settings. university of massachusetts. Technical report, Amherst, Tech. Rep. UM-CS-2010-009 (2010)
Google Scholar
Jain, V., Learned-Miller, E.: Online domain adaptation of a pre-trained cascade of classifiers. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2011)
Google Scholar
Jin, Y., Bouganis, C.S.: Robust multi-image based blind face hallucination. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Kazemi, V., Josephine, S.: One millisecond face alignment with an ensemble of regression trees. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Koestinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: First IEEE International Workshop on Benchmarking Facial Image Analysis Technologies (2011)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Conference on Neural Information Processing Systems (NIPS) (2012)
Google Scholar
Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Attribute and simile classifiers for face verification. In: International Conference on Computer Vision (ICCV) (2009)
Google Scholar
Kumar, N., Berg, A.C., Belhumeur, P.N., Nayar, S.K.: Describable visual attributes for face verification and image search. IEEE Trans. Pattern Anal. Mach. Intell. 33(10), 1962–1977 (2011)
Article Google Scholar
Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature localization. In: European Conference on Computer Vision (ECCV) (2012)
Google Scholar
LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Conference on Neural Information Processing Systems (NIPS) (1990)
Google Scholar
Li, J., Zhang, Y.: Learning SURF cascade for fast and accurate object detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Li, H., Hua, G., Lin, Z., Brandt, J., Yang, J.: Probabilistic elastic part model for unsupervised face detector adaptation. In: International Conference on Computer Vision (ICCV) (2013)
Google Scholar
Li, H., Lin, Z., Brandt, J., Shen, X., Hua, G.: Efficient boosted exemplar-based face detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Liu, Z., Li, X., Luo, P., Loy, C.C., Tang, X.: Semantic image segmentation via deep parsing network. In: International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Lu, C., Tang, X.: Surpassing human-level face verification performance on LFW with gaussianface. arXiv preprint arXiv:1404.3840 (2014)
Luo, P., Wang, X., Tang, X.: A deep sum-product architecture for robust facial attributes analysis. In: International Conference on Computer Vision (ICCV) (2013)
Google Scholar
Mathias, M., Benenson, R., Pedersoli, M., Van Gool, L.: Face detection without bells and whistles. In: European Conference on Computer Vision (ECCV) (2014)
Google Scholar
McCullagh, P., Nelder, J.A., McCullagh, P.: Generalized linear models. Chapman and Hall London (1989)
Google Scholar
Mnih, V., Hinton, G.: Learning to label aerial images from noisy data. In: International Conference on Machine Learning (ICML) (2012)
Google Scholar
Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 fps via regressing local binary features. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Conference on Neural Information Processing Systems (NIPS) (2015)
Google Scholar
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: the first facial landmark localization challenge. In: International Conference on Computer Vision Workshop (2013)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Shen, X., Lin, Z., Brandt, J., Wu, Y.: Detecting and aligning faces by image retrieval. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Sun, Y., Chen, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. In: Conference on Neural Information Processing Systems (NIPS) (2014)
Google Scholar
Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Tzimiropoulos, G., Pantic, M.: Gauss-newton deformable part models for face alignment in-the-wild. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Uijlings, J.R., van de Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Article Google Scholar
Wang, J., Yang, J., Yu, K., Lv, F., Huang, T., Gong, Y.: Locality-constrained linear coding for image classification. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
Google Scholar
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Wang, N., Tao, D., Gao, X., Li, X., Li, J.: A comprehensive survey to face hallucination. Int. J. Comput. Vis. 106(1), 9–30 (2014)
Article Google Scholar
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)
MATH Google Scholar
Xiao, J., Hays, J., Ehinger, K.A., Oliva, A., Torralba, A.: SUN database: large-scale scene recognition from abbey to zoo. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2010)
Google Scholar
Xiong, X., Torre, F.: Supervised descent method and its applications to face alignment. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Yan, J., Lei, Z., Wen, L., Li, S.: The fastest deformable part model for object detection. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Yan, J., Zhang, X., Lei, Z., Li, S.Z.: Face detection by structural models. Image Vis. Comput. 32(10), 790–799 (2014)
Article Google Scholar
Yang, B., Yan, J., Lei, Z., Li, S.Z.: Aggregate channel features for multi-view face detection. In: International Joint Conference on Biometrics (IJCB) (2014)
Google Scholar
Yang, C.Y., Liu, S., Yang, M.H.: Structured face hallucination. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2013)
Google Scholar
Yang, H., Patras, I.: Sieving regression forest votes for facial feature detection in the wild. In: International Conference on Computer Vision (ICCV) (2013)
Google Scholar
Yang, H., Jia, X., Loy, C.C., Robinson, P.: An empirical study of recent face alignment methods. arXiv preprint arXiv:1511.05049 (2015)
Yang, S., Luo, P., Loy, C.C., Tang, X.: From facial parts responses to face detection: a deep learning approach. In: International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Yu, X., Huang, J., Zhang, S., Yan, W., Metaxas, D.N.: Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In: International Conference on Computer Vision (ICCV) (2013)
Google Scholar
Zhang, J., Shan, S., Kan, M., Chen, X.: Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In: European Conference on Computer Vision (ECCV) (2014)
Google Scholar
Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.: PANDA: pose aligned networks for deep attribute modeling. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: European Conference on Computer Vision (ECCV) (2014)
Google Scholar
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning deep representation for face alignment with auxiliary attributes. IEEE Trans. Pattern Anal. Mach. Intell. 38(5), 918–930 (2015)
Article Google Scholar
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Learning social relation traits from face images. In: International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Zhu, S., Li, C., Loy, C.C., Tang, X.: Face alignment by coarse-to-fine shape searching. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Google Scholar
Zhu, S., Li, C., Loy, C.C., Tang, X.: Unconstrained face alignment via cascaded compositional learning. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: European Conference on Computer Vision (ECCV) (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Engineering, The Chinese University of Hong Kong, Shatin, NT, Hong Kong
Chen Change Loy & Ping Luo
Robotics Institute, Carnegie Mellon University, Pittsburgh, United States
Chen Huang

Authors

Chen Change Loy
View author publications
You can also search for this author in PubMed Google Scholar
Ping Luo
View author publications
You can also search for this author in PubMed Google Scholar
Chen Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chen Change Loy .

Editor information

Editors and Affiliations

IBM T.J. Watson Research Center, Yorktown Heights, New York, USA
Rogerio Schmidt Feris
IST Austria Computer Vision and Machine Learning, Klosterneuburg, Austria
Christoph Lampert
Virginia Tech Electrical and Computer Engineering, Blacksburg, Virginia, USA
Devi Parikh

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Loy, C.C., Luo, P., Huang, C. (2017). Deep Learning Face Attributes for Detection and Alignment. In: Feris, R., Lampert, C., Parikh, D. (eds) Visual Attributes. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-50077-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-50077-5_8
Published: 22 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50075-1
Online ISBN: 978-3-319-50077-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics