Skip to main content
Log in

Identity-Preserving Face Recovery from Stylized Portraits

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

Given an artistic portrait, recovering the latent photorealistic face that preserves the subject’s identity is challenging because the facial details are often distorted or fully lost in artistic portraits. We develop an Identity-preserving Face Recovery from Portraits method that utilizes a Style Removal network (SRN) and a Discriminative Network (DN). Our SRN, composed of an autoencoder with residual block-embedded skip connections, is designed to transfer feature maps of stylized images to the feature maps of the corresponding photorealistic faces. Owing to the Spatial Transformer Network, SRN automatically compensates for misalignments of stylized portraits to output aligned realistic face images. To ensure the identity preservation, we promote the recovered and ground truth faces to share similar visual features via a distance measure which compares features of recovered and ground truth faces extracted from a pre-trained FaceNet network. DN has multiple convolutional and fully-connected layers, and its role is to enforce recovered faces to be similar to authentic faces. Thus, we can recover high-quality photorealistic faces from unaligned portraits while preserving the identity of the face in an image. By conducting extensive evaluations on a large-scale synthesized dataset and a hand-drawn sketch dataset, we demonstrate that our method achieves superior face recovery and attains state-of-the-art results. In addition, our method can recover photorealistic faces from unseen stylized portraits, artistic paintings, and hand-drawn sketches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

References

  • Archibald prize; art gallery of nsw. https://www.artgallery.nsw.gov.au/prizes/archibald/. https://www.artgallery.nsw.gov.au/prizes/archibald/ (2017).

  • Chen, D., Yuan, L., Liao, J., & Yu, N., Hua, G. (2017). Stylebank: An explicit representation for neural image style transfer. arXiv preprint arXiv:1703.09210.

  • Chen, T. Q., & Schmidt, M. (2016). Fast patch-based style transfer of arbitrary style. arXiv preprint arXiv:1612.04337.

  • Denton, E. L., Chintala, S., & Fergus, R., et al. (2015). Deep generative image models using a laplacian pyramid of adversarial networks. In NIPS.

  • Dumoulin, V., Shlens, J., & Kudlur, M. (2016). A learned representation for artistic style. arXiv preprint arXiv:1610.07629.

  • Gatys, L. A., Bethge, M., Hertzmann, A., & Shechtman, E. (2016). Preserving color in neural artistic style transfer. arXiv preprint arXiv:1606.05897.

  • Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In CVPR.

  • Gatys, L. A., Ecker, A. S., Bethge, M., Hertzmann, A., & Shechtman, E. (2016) Controlling perceptual factors in neural style transfer. arXiv preprint arXiv:1611.07865.

  • Goodfellow, I., Pouget-Abadie, J., & Mirza, M. (2014). Generative Adversarial Networks. In NIPS.

  • Gupta, A., Johnson, J., Alahi, A., & Fei-Fei, L. (2017). Characterizing and improving stability in neural style transfer. arXiv preprint arXiv:1705.02092.

  • Hinton, G.: Neural Networks for Machine Learning Lecture 6a: Overview of mini-batch gradient descent Reminder: The error surface for a linear neuron

  • Huang, R., Zhang, S., Li, T., & He, R. (2017). Beyond face rotation: Global and local perception gan for photorealistic and identity preserving frontal view synthesis. arXiv preprint arXiv:1704.04086.

  • Huang, X., & Belongie, S. (2017). Arbitrary style transfer in real-time with adaptive instance normalization. arXiv preprint arXiv:1703.06868.

  • Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2016). Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004.

  • Jaderberg, M., Simonyan, K., & Zisserman, A., et al. (2015). Spatial transformer networks. In NIPS (pp. 2017–2025).

  • Jayasumana, S., Hartley, R., Salzmann, M., Li, H., & Harandi, M. (2013). Kernel methods on the riemannian manifold of symmetric positive definite matrices. In CVPR.

  • Johnson, J., Alahi, A., & Fei-Fei, L. (2016). Perceptual losses for real-time style transfer and super-resolution. In ECCV. Springer.

  • Karacan, L., Akata, Z., Erdem, A., & Erdem, E. (2016). Learning to generate images of outdoor scenes from attributes and semantic layouts. arXiv preprint arXiv:1612.00215.

  • Kazemi, H., Iranmanesh, M., Dabouei, A., Soleymani, S., & Nasrabadi, N.M. (2018). Facial attributes guided deep sketch-to-photo synthesis. In 2018 IEEE winter applications of computer vision workshops (WACVW) (pp. 1–8). IEEE.

  • Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.

  • Koniusz, P., Tas, Y., Zhang, H., Harandi, M., Porikli, F., & Zhang, R. (2018). Museum exhibit identification challenge for the supervised domain adaptation and beyond. In ECCV (pp. 788–804).

  • Koniusz, P., Yan, F., Gosselin, P., & Mikolajczyk, K. (2016). Higher-order occurrence pooling for bags-of-words: Visual concept detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(2), 313–326.

    Article  Google Scholar 

  • Koniusz, P., Zhang, H., & Porikli, F. (2018). A deeper look at power normalizations. In IEEE Conference on Computer Vision and Pattern Recognition (pp. 5774–5783).

  • Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., & Wang, Z., et al. (2016). Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802.

  • Li, C., & Wand, M. (2016a). Combining markov random fields and convolutional neural networks for image synthesis. In CVPR.

  • Li, C., & Wand, M. (2016b). Precomputed real-time texture synthesis with markovian generative adversarial networks. In ECCV (pp. 702–716). Springer.

  • Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., & Yang, M.H. (2017). Diversified texture synthesis with feed-forward networks. arXiv preprint arXiv:1703.01664.

  • Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In ICCV.

  • Long, J., Shelhamer, E., & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In CVPR.

  • Maas, A. L., Hannun, A. Y., & Ng, A. Y. (2013). Rectifier nonlinearities improve neural network acoustic models. In ICML.

  • Nejati, H., & Sim, T. (2011). A study on recognizing non-artistic face sketches. In WACV. IEEE.

  • Oord, A. V. D., Kalchbrenner, N., & Kavukcuoglu, K. (2016). Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759.

  • Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In BMVC.

  • Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., & Efros, A. A. (2016). Context encoders: Feature learning by inpainting. In CVPR.

  • Phillips, P. J., Wechsler, H., Huang, J., & Rauss, P. J. (1998). The feret database and evaluation procedure for face-recognition algorithms. Image and Vision Computing, 16(5), 295–306.

    Article  Google Scholar 

  • Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.

  • Ronneberger, O., & Fischer, P., Brox, T. (2015). U-net: Convolutional networks for biomedical image segmentation. In MICCAI. Springer.

  • Sangkloy, P., Lu, J., Fang, C., Yu, F., & Hays, J. (2016). Scribbler: Controlling deep image synthesis with sketch and color. arXiv preprint arXiv:1612.00835.

  • Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In CVPR.

  • Selim, A., Elgharib, M., & Doyle, L. (2016). Painting style transfer for head portraits using convolutional neural networks. ACM (TOG), 35(4), 129.

    Google Scholar 

  • Sharma, A., & Jacobs, D. W. (2011). Bypassing synthesis: Pls for face recognition with pose, low-resolution and sketch. In CVPR. IEEE.

  • Shiri, F., Yu, X., Koniusz, P., & Porikli, F. (2017). Face destylization. In International conference on digital image computing: Techniques and applications (DICTA). https://doi.org/10.1109/DICTA.2017.8227432.

  • Shiri, F., Yu, X., Porikli, F., Hartley, R., & Koniusz, P. (2018). Identity-preserving face recovery from portraits. In WACV (pp. 102–111). https://doi.org/10.1109/WACV.2018.00018.

  • Shiri, F., Yu, X., Porikli, F., Hartley, R., & Koniusz, P. (2019). Recovering faces from portraits with auxiliary facial attributes. In WACV.

  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.

  • Ulyanov, D., Lebedev, V., Vedaldi, A., & Lempitsky, V. S. (2016a). Texture networks: Feed-forward synthesis of textures and stylized images. In ICML.

  • Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2016b). Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022.

  • Ulyanov, D., Vedaldi, A., & Lempitsky, V. (2017). Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis. arXiv preprint arXiv:1701.02096.

  • Wang, L., Sindagi, V., & Patel, V. (2018a). High-quality facial photo-sketch synthesis using multi-adversarial networks. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018) (pp. 83–90). IEEE.

  • Wang, N., Zha, W., Li, J., & Gao, X. (2018b). Back projection: An effective postprocessing method for gan-based face sketch synthesis. Pattern Recognition Letters, 107, 59–65.

    Article  Google Scholar 

  • Wang, X., Oxholm, G., Zhang, D., & Wang, Y.F. (2016). Multimodal transfer: A hierarchical deep convolutional neural network for fast artistic style transfer. arXiv preprint arXiv:1612.01895.

  • Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE TIP, 13(4), 600–612.

    Google Scholar 

  • Wilmot, P., Risser, E., & Barnes, C. (2017). Stable and controllable neural texture synthesis and style transfer using histogram losses. arXiv preprint arXiv:1701.08893.

  • Yin, R. (2016). Content aware neural style transfer. arXiv preprint arXiv:1601.04568.

  • Yu, F., Zhang, Y., Song, S., Seff, A., & Xiao, J. (2015). Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365.

  • Yu, X., & Porikli, F. (2016). Ultra-resolving face images by discriminative generative networks. In ECCV.

  • Yu, X., & Porikli, F. (2017a). Face hallucination with tiny unaligned images by transformative discriminative neural networks. In AAAI.

  • Yu, X., & Porikli, F. (2017b). Hallucinating very low-resolution unaligned and noisy face images by transformative discriminative autoencoders. In CVPR.

  • Zhang, H., & Dana, K. (2017). Multi-style generative network for real-time transfer. arXiv preprint arXiv:1703.06953.

  • Zhang, H., Sindagi, V., & Patel, V. M. (2017). Image de-raining using a conditional generative adversarial network. arXiv preprint arXiv:1701.05957.

  • Zhang, L., Zhang, L., Mou, X., Zhang, D., et al. (2011a). Fsim: A feature similarity index for image quality assessment. IEEE Transactions on Image Processing, 20(8), 2378–2386.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, W., Wang, X., & Tang, X. (2011b). Coupled information-theoretic encoding for face photo-sketch recognition. In CVPR. IEEE.

  • Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2014). Facial landmark detection by deep multi-task learning. In ECCV. Springer.

  • Zhu, J. Y., Park, T., Isola, P., & Efros, A. A.(2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593.

Download references

Acknowledgements

This work is supported by the Australian Research Council (ARC) Grant DP150104645.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Piotr Koniusz.

Additional information

Communicated by Dr. Rama Chellappa, Dr. Xiaoming Liu, Dr. Tae-Kyun Kim, Dr. Fernando De la Torre and Dr. Chen Change Loy.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Face Alignment: Spatial Transfer Networks (STN)

As described in Sect. 3.1, we incorporate multiple STNs (Jaderberg et al. 2015) as intermediate layers to compensate for misalignments and in-plane rotations. The STN layers estimate the motion parameters of face images and warp them to a canonical view. Each STN contains localization, grid generator and sampler modules. The localization module consists of several hidden layers to estimate the transformation parameters with respect to the canonical view. The grid generator module creates a sampling grid according to the estimated parameters. Finally, the sampler module maps the input feature maps into generated girds using the bilinear interpolation. The architecture of our STN layers is detailed in Tables 8910 and 11.

Table 8 The STN1 architecture
Table 9 The STN2 architecture
Table 10 The STN3 architecture
Table 11 The STN4 architecture

Contributions of Each Component in the IFRP Network

In Sect. 3, we described the impact of the \(\ell _2\) loss, the adversarial loss and the identity-preserving loss on the face recovery from portraits. Figure 20 further shows the contribution of each loss function to the final results.

Visual Comparison with the State of the Art

Below, we provide several additional results demonstrating the performance of our IFRP network compared to the state-of-art approaches (Fig. 21).

Fig. 20
figure 20

More results showing the contribution of each loss function in our IFRP network. a Ground truth face images. b Input unaligned portraits from test dataset. c Recovered face images; the pixel-wise loss was used in training (no DN or identity-preserving losses). d Recovered face images; the pixel-wise loss and discriminative loss were used (no identity-preserving loss). e Our final results with the pixel-wise loss, discriminative loss and identity-preserving loss used during training

Fig. 21
figure 21

Additional qualitative comparisons of the state-of-the-art methods. a Ground truth face images. b Input portraits (from the test dataset) including the seen styles Scream, Mosaic and Candy as well as the unseen styles Sketch, Composition VII, Feathers, Udnie and La Muse. c Gatys et al.’s method (2016). d Johnson et al.’s method (2016). e Li and Wand’s method (2016b) (MGAN). f Isola et al.’s method (2016) (pix2pix). g Zhu et al.’s method (2017) (CycleGAN). h Shiri et al.’s method (2017), i Our method

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shiri, F., Yu, X., Porikli, F. et al. Identity-Preserving Face Recovery from Stylized Portraits. Int J Comput Vis 127, 863–883 (2019). https://doi.org/10.1007/s11263-019-01169-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-019-01169-1

Keywords

Navigation