Advertisement

ReenactGAN: Learning to Reenact Faces via Boundary Transfer

  • Wayne WuEmail author
  • Yunxuan Zhang
  • Cheng Li
  • Chen Qian
  • Chen Change Loy
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11205)

Abstract

We present a novel learning-based framework for face reenactment. The proposed method, known as ReenactGAN, is capable of transferring facial movements and expressions from an arbitrary person’s monocular video input to a target person’s video. Instead of performing a direct transfer in the pixel space, which could result in structural artifacts, we first map the source face onto a boundary latent space. A transformer is subsequently used to adapt the source face’s boundary to the target’s boundary. Finally, a target-specific decoder is used to generate the reenacted target face. Thanks to the effective and reliable boundary-based transfer, our method can perform photo-realistic face reenactment. In addition, ReenactGAN is appealing in that the whole reenactment process is purely feed-forward, and thus the reenactment process can run in real-time (30 FPS on one GTX 1080 GPU). Dataset and model are publicly available on our project page (Project Page: https://wywu.github.io/projects/ReenactGAN/ReenactGAN.html).

Keywords

Face reenactment Face generation Face alignment GAN 

Notes

Acknowledgment

We would like to thank Kwan-Yee Lin for insightful discussion, and Tong Li, Yue He and Lichen Zhou for their exceptional support. This work is supported by SenseTime Research.

Supplementary material

474172_1_En_37_MOESM1_ESM.pdf (571 kb)
Supplementary material 1 (pdf 570 KB)

Supplementary material 2 (avi 36111 KB)

References

  1. 1.
    Bartlett, M.S., Littlewort, G., Frank, M.G., Lainscsek, C., Fasel, I.R., Movellan, J.R.: Automatic recognition of facial actions in spontaneous expressions. J. Multimedia 1(6), 22–35 (2006)CrossRefGoogle Scholar
  2. 2.
    Cao, C., Weng, Y., Lin, S., Zhou, K.: 3D shape regression for real-time facial animation. ACM Trans. Graph. (TOG) 32(4), 41 (2013)CrossRefGoogle Scholar
  3. 3.
    Cheng, Y.T., et al.: 3D-model-based face replacement in video. ACM (2009)Google Scholar
  4. 4.
    Dale, K., Sunkavalli, K., Johnson, M.K., Vlasic, D., Matusik, W., Pfister, H.: Video face replacement. ACM Trans. Graph. (TOG) 30(6), 1–10 (2011)CrossRefGoogle Scholar
  5. 5.
    Deng, J., Trigeorgis, G., Zhou, Y., Zafeiriou, S.: Joint multi-view face alignment in the wild arXiv:1708.06023 (2017)
  6. 6.
    Dong, X., Yan, Y., Ouyang, W., Yang, Y.: Style aggregated network for facial landmark detection. In: CVPR (2018)Google Scholar
  7. 7.
    Ekman, P., Friesen, W., Hager, J.: Facial Action Coding System (FACS): Manual. A Human Face (2002)Google Scholar
  8. 8.
    Garrido, P., Valgaerts, L., Rehmsen, O., Thormahlen, T., Perez, P., Theobalt, C.: Automatic face reenactment. In: CVPR (2014)Google Scholar
  9. 9.
    Garrido, P., et al.: VDub: modifying face video of actors for plausible visual alignment to a dubbed audio track. In: Computer Graphics Forum, vol. 34, pp. 193–204. Wiley Online Library (2015)Google Scholar
  10. 10.
    Goodfellow, I.J., et al.: Generative adversarial networks. In: NIPS, vol. 3, pp. 2672–2680 (2014)Google Scholar
  11. 11.
    Güler, R.A., Neverova, N., Kokkinos, I.: Densepose: Dense human pose estimation in the wild. In: CVPR (2018)Google Scholar
  12. 12.
    Güler, R.A., Trigeorgis, G., Antonakos, E., Snape, P., Zafeiriou, S., Kokkinos, I.: DenseReg: fully convolutional dense shape regression in-the-wild. In: CVPR (2017)Google Scholar
  13. 13.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed tracking with kernelized correlation filters. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 37(3), 583–596 (2015)CrossRefGoogle Scholar
  14. 14.
    Herbrich, R., Minka, T., Graepel, T.: TrueSkill\({}^{\text{tm}}\): a bayesian skill rating system. In: NIPS (2006)Google Scholar
  15. 15.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)Google Scholar
  16. 16.
    Jin, X., Qi, Y., Wu, S.: Cyclegan face-off arXiv:1712.03451 (2017)
  17. 17.
    Kapoor, A., Picard, R.W.: Multimodal affect recognition in learning environments. In: MM (2005)Google Scholar
  18. 18.
    Kim, H., et al.: Deep video portraits. In: SIGGRAPH (2018)Google Scholar
  19. 19.
    Kumar, A., Chellappa, R.: Disentangling 3D pose in a dendritic CNN for unconstrained 2D face alignment. In: CVPR (2018)Google Scholar
  20. 20.
    Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg (2012).  https://doi.org/10.1007/978-3-642-33712-3_49CrossRefGoogle Scholar
  21. 21.
    Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR (2017)Google Scholar
  22. 22.
    Li, K., Xu, F., Wang, J., Dai, Q., Liu, Y.: A data-driven approach for facial expression synthesis in video. In: CVPR (2012)Google Scholar
  23. 23.
    Lin, K.Y., Wang, G.: Hallucinated-IQA: no-reference image quality assessment via adversarial learning. In: CVPR (2018)Google Scholar
  24. 24.
    Lin, K.Y., Wang, G.: Self-supervised deep multiple choice learning network for blind image quality assessment. In: BMVC (2018)Google Scholar
  25. 25.
    Mahoor, M.H., Cadavid, S., Messinger, D.S., Cohn, J.F.: A framework for automated measurement of the intensity of non-posed facial action units. In: CVPR (2009)Google Scholar
  26. 26.
    Mavadati, S.M., Mahoor, M.H., Bartlett, K., Trinh, P., Cohn, J.F.: DISFA: a spontaneous facial action intensity database. IEEE Trans. Affect. Comput. (TAC) 4(2), 151–160 (2013)CrossRefGoogle Scholar
  27. 27.
    Mueller, F., et al.: GANerated hands for real-time 3D hand tracking from monocular RGB. In: CVPR (2018)Google Scholar
  28. 28.
    Nguyen, A., Yosinski, J., Bengio, Y., Dosovitskiy, A., Clune, J.: Plug & play generative networks: conditional iterative generation of images in latent space arXiv:1612.00005 (2016)
  29. 29.
    Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)Google Scholar
  30. 30.
    Sajjadi, M.S.M., Scholkopf, B., Hirsch, M.: Enhancenet: single image super-resolution through automated texture synthesis. In: ICCV (2017)Google Scholar
  31. 31.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition, vol. abs/1409.1556 (2014)Google Scholar
  32. 32.
    Suwajanakorn, S., Seitz, S.M., Kemelmacher-Shlizerman, I.: What makes tom hanks look like tom hanks. In: ICCV (2015)Google Scholar
  33. 33.
    Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: CVPR (2014)Google Scholar
  34. 34.
    Thies, J., Zollhöfer, M., Nießner, M., Valgaerts, L., Stamminger, M., Theobalt, C.: Real-time expression transfer for facial reenactment. ACM Trans. Graph. (TOG) 34(6), 183 (2015)CrossRefGoogle Scholar
  35. 35.
    Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2Face: real-time face capture and reenactment of RGB videos. In: CVPR (2016)Google Scholar
  36. 36.
    Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., Nießner, M.: Facevr: Real-time facial reenactment and eye gaze control in virtual reality. CoRR abs/1610.03151 (2016)Google Scholar
  37. 37.
    Tong, Y., Liao, W., Ji, Q.: Facial action unit recognition by exploiting their dynamic and semantic relationships. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 29(10), 1683–1699 (2007)CrossRefGoogle Scholar
  38. 38.
    Tran, D.L., Walecki, R., Rudovic, O., Eleftheriadis, S., Schuller, B.W., Pantic, M.: Deepcoder: semi-parametric variational autoencoders for automatic facial action coding. In: ICCV (2017)Google Scholar
  39. 39.
    Trigeorgis, G., Snape, P., Nicolaou, M.A., Antonakos, E., Zafeiriou, S.: Mnemonic descent method: a recurrent process applied for end-to-end face alignment. In: CVPR (2016)Google Scholar
  40. 40.
    Upchurch, P., Gardner, J., Bala, K., Pless, R., Snavely, N., Weinberger, K.Q.: Deep feature interpolation for image content changes. arXiv:1611.05507 (2016)
  41. 41.
    Vlasic, D., Brand, M., Pfister, H., Popović, J.: Face transfer with multilinear models. ACM Trans. Graph. (TOG) 24(3), 426–433 (2005)CrossRefGoogle Scholar
  42. 42.
    Walecki, R., Rudovic, O., Pavlovic, V., Pantic, M.: Copula ordinal regression for joint estimation of facial action unit intensity. In: CVPR (2016)Google Scholar
  43. 43.
    Walecki, R., Rudovic, O., Pavlovic, V., Schuller, B.W., Pantic, M.: Deep structured learning for facial action unit intensity estimation. In: CVPR (2017)Google Scholar
  44. 44.
    Wang, C., Shi, F., Xia, S., Chai, J.: Realtime 3D eye gaze animation using a single RGB camera. ACM Trans. Graph. (TOG) 35(4), 118 (2016)Google Scholar
  45. 45.
    Wang, T., Liu, M., Zhu, J., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: CVPR (2018)Google Scholar
  46. 46.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. (TIP) 13(4), 600–612 (2004)CrossRefGoogle Scholar
  47. 47.
    Wei, T.: Cyclegan face-off (2017). https://www.youtube.com/watch?v=Fea4kZq0oFQ
  48. 48.
    Wu, C., et al.: Model-based teeth reconstruction. ACM Trans. Graph. (TOG) 35(6), 220 (2016)CrossRefGoogle Scholar
  49. 49.
    Wu, S., Wang, S., Pan, B., Ji, Q.: Deep facial action unit recognition from partially labeled data. In: ICCV (2017)Google Scholar
  50. 50.
    Wu, W., Qian, C., Yang, S., Wang, Q., Cai, Y., Zhou, Q.: Look at boundary: a boundary-aware face alignment algorithm. In: CVPR (2018)Google Scholar
  51. 51.
    Wu, W., Yang, S.: Leveraging intra and inter-dataset variations for robust face alignment. In: CVPR Workshop (2017)Google Scholar
  52. 52.
    Xiao, S., Feng, J., Xing, J., Lai, H., Yan, S., Kassim, A.: Robust facial landmark detection via recurrent attentive-refinement networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 57–72. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46448-0_4CrossRefGoogle Scholar
  53. 53.
    Xu, R., Zhou, Z., Zhang, W., Yu, Y.: Face transfer with generative adversarial network arXiv:1710.06090 (2017)
  54. 54.
    Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: ICCV (2017)Google Scholar
  55. 55.
    Zhang, Y., Liu, L., Li, C., Loy, C.C.: Quantifying facial age by posterior of age comparisons. In: BMVC (2017)Google Scholar
  56. 56.
    Zhao, K., Chu, W., Zhang, H.: Deep region and multi-label learning for facial action unit detection. In: CVPR (2016)Google Scholar
  57. 57.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networkss. In: ICCV (2017)Google Scholar
  58. 58.
    Zhu, J., et al.: Toward multimodal image-to-image translation. In: NIPS (2017)Google Scholar
  59. 59.
    Zhu, S., Fidler, S., Urtasun, R., Lin, D., Loy, C.C.: Be your own Prada: fashion synthesis with structural coherence. In: ICCV (2017)Google Scholar
  60. 60.
    Zhu, S., Li, C., Change Loy, C., Tang, X.: Face alignment by coarse-to-fine shape searching. In: CVPR (2015)Google Scholar
  61. 61.
    Zhu, S., Li, C., Loy, C.C., Tang, X.: Unconstrained face alignment via cascaded compositional learning. In: CVPR (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.SenseTime ResearchBeijingChina
  2. 2.Nanyang Technological UniversitySingaporeSingapore

Personalised recommendations