Dual-Agent Deep Reinforcement Learning for Deformable Face Tracking

  • Minghao Guo
  • Jiwen LuEmail author
  • Jie Zhou
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11214)


In this paper, we propose a dual-agent deep reinforcement learning (DADRL) method for deformable face tracking, which generates bounding boxes and detects facial landmarks interactively from face videos. Most existing deformable face tracking methods learn models for these two tasks individually, and perform these two procedures subsequently during the testing phase, which ignore the intrinsic connections of these two tasks. Motivated by the fact that the performance of facial landmark detection depends heavily on the accuracy of the generated bounding boxes, we exploit the interactions of these two tasks in probabilistic manner by following a Bayesian model and propose a unified framework for simultaneous bounding box tracking and landmark detection. By formulating it as a Markov decision process, we define two agents to exploit the relationships and pass messages via an adaptive sequence of actions under a deep reinforcement learning framework to iteratively adjust the positions of the bounding boxes and facial landmarks. Our proposed DADRL achieves performance improvements over the state-of-the-art deformable face tracking methods on the most challenging category of the 300-VW dataset.


Deformable face tracking Reinforcement learning Deep learning 



This work was supported in part by the National Key Research and Development Program of China under Grant 2017YFA0700802, in part by the National Natural Science Foundation of China under Grant 61822603, under Grant 61672306, Grant U1713214, Grant 61572271, and in part by the Shenzhen Fundamental Research Fund (Subject Arrangement) under Grant JCYJ20170412170602564.


  1. 1.
    Wang, X., Yang, M., Zhu, S., Lin, Y.: Regionlets for generic object detection. In: ICCV, pp. 17–24 (2013)Google Scholar
  2. 2.
    Xiao, S., Yan, S., Kassim, A.A.: Facial landmark detection via progressive initialization. In: ICCVW, pp. 33–40 (2015)Google Scholar
  3. 3.
    Chrysos, G.G., Antonakos, E., Snape, P., Asthana, A., Zafeiriou, S.: A comprehensive performance evaluation of deformable face tracking “in-the-wild”. IJCV 126(2–4), 198–232 (2018)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Shen, J., Zafeiriou, S., Chrysos, G.G., Kossaifi, J., Tzimiropoulos, G., Pantic, M.: The first facial landmark tracking in-the-wild challenge: Benchmark and results. In: ICCVW, pp. 50–58 (2015)Google Scholar
  5. 5.
    Asthana, A., Zafeiriou, S., Cheng, S., Pantic, M.: Incremental face alignment in the wild. In: CVPR, pp. 1859–1866 (2014)Google Scholar
  6. 6.
    Peng, X., Zhang, S., Yang, Y., Metaxas, D.N.: PIEFA: personalized incremental and ensemble face alignment. In: ICCV, pp. 3880–3888 (2015)Google Scholar
  7. 7.
    Peng, X., Feris, R.S., Wang, X., Metaxas, D.N.: A recurrent encoder-decoder network for sequential face alignment. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 38–56. Springer, Cham (2016). Scholar
  8. 8.
    Liu, H., Lu, J., Feng, J., Zhou, J.: Two-stream transformer networks for video-based face alignment. TPAMI (2017).
  9. 9.
    Black, M.J., Yacoob, Y.: Tracking and recognizing rigid and non-rigid facial motions using local parametric models of image motion. In: 1995 Proceedings of Fifth International Conference on Computer Vision, pp. 374–381. IEEE (1995)Google Scholar
  10. 10.
    Chrysos, G.G., Antonakos, E., Zafeiriou, S., Snape, P.: Offline deformable face tracking in arbitrary videos. In: ICCVW, pp. 1–9 (2015)Google Scholar
  11. 11.
    Decarlo, D., Metaxas, D.: Optical flow constraints on deformable models with applications to face tracking. IJCV 38(2), 99–127 (2000)CrossRefGoogle Scholar
  12. 12.
    Tzimiropoulos, G.: Project-out cascaded regression with an application to face alignment. In: CVPR, pp. 3659–3667 (2015)Google Scholar
  13. 13.
    Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. IJCV 107(2), 177–190 (2014)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR, pp. 532–539 (2013)Google Scholar
  15. 15.
    Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014). Scholar
  16. 16.
    Trigeorgis, G., Snape, P., Nicolaou, M.A., Antonakos, E., Zafeiriou, S.: Mnemonic descent method: a recurrent process applied for end-to-end face alignment. In: CVPR, pp. 4177–4187 (2016)Google Scholar
  17. 17.
    Zhang, J., Shan, S., Kan, M., Chen, X.: Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8690, pp. 1–16. Springer, Cham (2014). Scholar
  18. 18.
    Kumar, A., Chellappa, R.: Disentangling 3D pose in a dendritic cnn for unconstrained 2D face alignment. arXiv preprint arXiv:1802.06713 (2018)
  19. 19.
    Khan, M.H., McDonagh, J., Tzimiropoulos, G.: Synergy between face alignment and tracking via discriminative global consensus optimization. In: ICCV 2017, pp. 3791–3799 (2017)Google Scholar
  20. 20.
    Littman, M.L.: Reinforcement learning improves behaviour from evaluative feedback. Nature 521(7553), 445 (2015)CrossRefGoogle Scholar
  21. 21.
    Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep q-learning with model-based acceleration. In: ICML, pp. 2829–2838 (2016)Google Scholar
  22. 22.
    Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
  23. 23.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar
  24. 24.
    Ammar, H.B., Eaton, E., Ruvolo, P., Taylor, M.: Online multi-task learning for policy gradient methods. In: ICML, pp. 1206–1214 (2014)Google Scholar
  25. 25.
    Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: ICML (2014)Google Scholar
  26. 26.
    Rao, Y., Lu, J., Zhou, J.: Attention-aware deep reinforcement learning for video face recognition. In: CVPR, pp. 3931–3940 (2017)Google Scholar
  27. 27.
    Yu, L., Zhang, W., Wang, J., Seqgan, Y.Y.: Sequence generative adversarial nets with policy gradient. arXiv preprint arXiv:1609.054732(3), 5 (2016)
  28. 28.
    Yoo, S.Y.J.C.Y., Yun, K., Choi, J.Y.: Action-decision networks for visual tracking with deep reinforcement learning (2017)Google Scholar
  29. 29.
    Foerster, J., Assael, Y., de Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: NIPS, pp. 2137–2145 (2016)Google Scholar
  30. 30.
    Sukhbaatar, S., Fergus, R., et al.: Learning multiagent communication with backpropagation. In: NIPS, pp. 2244–2252 (2016)Google Scholar
  31. 31.
    Kong, X., Xin, B., Wang, Y., Hua, G.: Collaborative deep reinforcement learning for joint object search. In: CVPR (2017)Google Scholar
  32. 32.
    Xia, Y., Qin, T., Chen, W., Bian, J., Yu, N., Liu, T.Y.: Dual supervised learning. arXiv preprint arXiv:1707.00415 (2017)
  33. 33.
    Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
  34. 34.
    Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: A semi-automatic methodology for facial landmark annotation. In: CVPRW, pp. 896–903 (2013)Google Scholar
  35. 35.
    Zafeiriou, S., Chrysos, G.G., Roussos, A., Ververas, E., Deng, J., Trigeorgis, G.: The 3D menpo facial landmark tracking challenge. In: ICCVW, vol. 5 (2017)Google Scholar
  36. 36.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015).
  37. 37.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)Google Scholar
  38. 38.
    Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). Scholar
  39. 39.
    Yang, J., Deng, J., Zhang, K., Liu, Q.: Facial shape tracking via spatio-temporal cascade shape regression. In: ICCVW, pp. 41–49 (2015)Google Scholar
  40. 40.
    Sánchez-Lozano, E., Martinez, B., Tzimiropoulos, G., Valstar, M.: Cascaded continuous regression for real-time incremental face tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 645–661. Springer, Cham (2016). Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Tsinghua UniversityBeijingChina

Personalised recommendations