Facial Landmark Detection by Deep Multi-task Learning
Abstract
Facial landmark detection has long been impeded by the problems of occlusion and pose variation. Instead of treating the detection task as a single and independent problem, we investigate the possibility of improving detection robustness through multi-task learning. Specifically, we wish to optimize facial landmark detection together with heterogeneous but subtly correlated tasks, e.g. head pose estimation and facial attribute inference. This is non-trivial since different tasks have different learning difficulties and convergence rates. To address this problem, we formulate a novel tasks-constrained deep model, with task-wise early stopping to facilitate learning convergence. Extensive evaluations show that the proposed task-constrained learning (i) outperforms existing methods, especially in dealing with faces with severe occlusion and pose variation, and (ii) reduces model complexity drastically compared to the state-of-the-art method based on cascaded deep model [21].
Keywords
Face Image Convolutional Neural Network Related Task Deep Neural Network Facial LandmarkReferences
- 1.Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Robust discriminative response map fitting with constrained local models. In: CVPR, pp. 3444–3451 (2013)Google Scholar
- 2.Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. In: CVPR, pp. 545–552 (2011)Google Scholar
- 3.Burgos-Artizzu, X.P., Perona, P., Dollar, P.: Robust face landmark estimation under occlusion. In: ICCV, pp. 1513–1520 (2013)Google Scholar
- 4.Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: CVPR, pp. 2887–2894 (2012)Google Scholar
- 5.Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)CrossRefMathSciNetGoogle Scholar
- 6.Chen, K., Gong, S., Xiang, T., Loy, C.C.: Cumulative attribute space for age and crowd density estimation. In: CVPR, pp. 2467–2474 (2013)Google Scholar
- 7.Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: ICML, pp. 160–167 (2008)Google Scholar
- 8.Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active appearance models. PAMI 23(6), 681–685 (2001)CrossRefGoogle Scholar
- 9.Cootes, T.F., Ionita, M.C., Lindner, C., Sauer, P.: Robust and accurate shape model fitting using random forest regression voting. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 278–291. Springer, Heidelberg (2012)CrossRefGoogle Scholar
- 10.Dantone, M., Gall, J., Fanelli, G., Van Gool, L.: Real-time facial feature detection using conditional regression forests. In: CVPR, pp. 2578–2585 (2012)Google Scholar
- 11.Kostinger, M., Wohlhart, P., Roth, P.M., Bischof, H.: Annotated facial landmarks in the wild: A large-scale, real-world database for facial landmark localization. In: ICCV Workshops, pp. 2144–2151 (2011)Google Scholar
- 12.Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
- 13.Li, H., Shen, C., Shi, Q.: Real-time visual tracking using compressive sensing. In: CVPR, pp. 1305–1312 (2011)Google Scholar
- 14.Liu, X.: Generic face alignment using boosted appearance model. In: CVPR (2007)Google Scholar
- 15.Lu, C., Tang, X.: Surpassing human-level face verification performance on LFW with GaussianFace. Tech. rep., arXiv:1404.3840 (2014)Google Scholar
- 16.Luo, P., Wang, X., Tang, X.: Hierarchical face parsing via deep learning. In: CVPR, pp. 2480–2487 (2012)Google Scholar
- 17.Luo, P., Wang, X., Tang, X.: A deep sum-product architecture for robust facial attributes analysis. In: CVPR, pp. 2864–2871 (2013)Google Scholar
- 18.Luxand Incorporated: Luxand face SDK, http://www.luxand.com/
- 19.Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: ICML, pp. 807–814 (2010)Google Scholar
- 20.Prechelt, L.: Automatic early stopping using cross validation: quantifying the criteria. Neural Networks 11(4), 761–767 (1998)CrossRefGoogle Scholar
- 21.Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: CVPR, pp. 3476–3483 (2013)Google Scholar
- 22.Sun, Y., Wang, X., Tang, X.: Deep learning face representation by joint identification-verification. Tech. rep., arXiv:1406.4773 (2014)Google Scholar
- 23.Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: CVPR (2014)Google Scholar
- 24.Valstar, M., Martinez, B., Binefa, X., Pantic, M.: Facial point detection using boosted regression and graph models. In: CVPR, pp. 2729–2736 (2010)Google Scholar
- 25.Xiong, X., De La Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR, pp. 532–539 (2013)Google Scholar
- 26.Yang, H., Patras, I.: Sieving regression forest votes for facial feature detection in the wild. In: ICCV, pp. 1936–1943 (2013)Google Scholar
- 27.Yu, X., Huang, J., Zhang, S., Yan, W., Metaxas, D.N.: Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model. In: ICCV, pp. 1944–1951 (2013)Google Scholar
- 28.Yuan, X.T., Liu, X., Yan, S.: Visual classification with multitask joint sparse representation. TIP 21(10), 4349–4360 (2012)MathSciNetGoogle Scholar
- 29.Zhang, T., Ghanem, B., Liu, S., Ahuja, N.: Robust visual tracking via structured multi-task sparse learning. IJCV 101(2), 367–383 (2013)CrossRefMathSciNetGoogle Scholar
- 30.Zhang, Y., Yeung, D.Y.: A convex formulation for learning task relationships in multi-task learning. In: UAI (2011)Google Scholar
- 31.Zhang, Z., Zhang, W., Liu, J., Tang, X.: Facial landmark localization based on hierarchical pose regression with cascaded random ferns. In: ACM Multimedia, pp. 561–564 (2013)Google Scholar
- 32.Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: CVPR, pp. 2879–2886 (2012)Google Scholar
- 33.Zhu, Z., Luo, P., Wang, X., Tang, X.: Deep learning identity-preserving face space. In: ICCV, pp. 113–120 (2013)Google Scholar
- 34.Zhu, Z., Luo, P., Wang, X., Tang, X.: Deep learning multi-view representation for face recognition. Tech. rep., arXiv:1406.6947 (2014)Google Scholar
- 35.Zhu, Z., Luo, P., Wang, X., Tang, X.: Recover canonical-view faces in the wild with deep neural networks. Tech. rep., arXiv:1404.3543 (2014)Google Scholar