Advertisement

Multi-style Generative Network for Real-Time Transfer

  • Hang ZhangEmail author
  • Kristin Dana
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11132)

Abstract

Despite the rapid progress in style transfer, existing approaches using feed-forward generative network for multi-style or arbitrary-style transfer are usually compromised of image quality and model flexibility. We find it is fundamentally difficult to achieve comprehensive style modeling using 1-dimensional style embedding. Motivated by this, we introduce CoMatch Layer that learns to match the second order feature statistics with the target styles. With the CoMatch Layer, we build a Multi-style Generative Network (MSG-Net), which achieves real-time performance. In addition, we employ an specific strategy of upsampled convolution which avoids checkerboard artifacts caused by fractionally-strided convolution. Our method has achieved superior image quality comparing to state-of-the-art approaches. The proposed MSG-Net as a general approach for real-time style transfer is compatible with most existing techniques including content-style interpolation, color-preserving, spatial control and brush stroke size control. MSG-Net is the first to achieve real-time brush-size control in a purely feed-forward manner for style transfer. Our implementations and pre-trained models for Torch, PyTorch and MXNet frameworks will be publicly available (Links can be found at http://hangzhang.org/).

Supplementary material

478824_1_En_32_MOESM1_ESM.mp4 (68.8 mb)
Supplementary material 1 (mp4 70473 KB)

References

  1. 1.
    Li, C., Wand, M.: Combining Markov random fields and convolutional neural networks for image synthesis. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016Google Scholar
  2. 2.
    Efros, A.A., Leung, T.K.: Texture synthesis by non-parametric sampling. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1033–1038. IEEE (1999)Google Scholar
  3. 3.
    Efros, A.A., Freeman, W.T.: Image quilting for texture synthesis and transfer. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 341–346. ACM (2001)Google Scholar
  4. 4.
    Wei, L.Y., Levoy, M.: Fast texture synthesis using tree-structured vector quantization. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, pp. 479–488. ACM Press/Addison-Wesley Publishing Co. (2000)Google Scholar
  5. 5.
    Kwatra, V., Schödl, A., Essa, I., Turk, G., Bobick, A.: Graphcut textures: image and video synthesis using graph cuts. ACM Trans. Graph. (ToG) 22, 277–286 (2003)CrossRefGoogle Scholar
  6. 6.
    De Bonet, J.S.: Multiresolution sampling procedure for analysis and synthesis of texture images. In: Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques, pp. 361–368. ACM Press/Addison-Wesley Publishing Co. (1997)Google Scholar
  7. 7.
    Heeger, D.J., Bergen, J.R.: Pyramid-based texture analysis/synthesis. In: Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques, pp. 229–238. ACM (1995)Google Scholar
  8. 8.
    Portilla, J., Simoncelli, E.P.: A parametric texture model based on joint statistics of complex wavelet coefficients. Int. J. Comput. Vis. 40(1), 49–70 (2000)CrossRefGoogle Scholar
  9. 9.
    Simoncelli, E.P., Freeman, W.T.: The steerable pyramid: a flexible architecture for multi-scale derivative computation. In: Proceedings of the International Conference on Image Processing, vol. 3, pp. 444–447. IEEE (1995)Google Scholar
  10. 10.
    Burt, P., Adelson, E.: The laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983)CrossRefGoogle Scholar
  11. 11.
    Gatys, L., Ecker, A.S., Bethge, M.: Texture synthesis using convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 262–270 (2015)Google Scholar
  12. 12.
    Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423 (2016)Google Scholar
  13. 13.
    Ulyanov, D., Lebedev, V., Vedaldi, A., Lempitsky, V.: Texture networks: feed-forward synthesis of textures and stylized images. In: International Conference on Machine Learning (ICML) (2016)Google Scholar
  14. 14.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part II. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_43CrossRefGoogle Scholar
  15. 15.
    Li, C., Wand, M.: Precomputed real-time texture synthesis with Markovian generative adversarial networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part III. LNCS, vol. 9907, pp. 702–716. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_43CrossRefGoogle Scholar
  16. 16.
    Ulyanov, D., Vedaldi, A., Lempitsky, V.: Improved texture networks: maximizing quality and diversity in feed-forward stylization and texture synthesis (2017). arXiv preprint: arXiv:1701.02096
  17. 17.
    Dumoulin, V., Shlens, J., Kudlur, M.: A learned representation for artistic style (2016)Google Scholar
  18. 18.
    Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization (2017). arXiv preprint: arXiv:1703.06868
  19. 19.
    Chen, D., Yuan, L., Liao, J., Yu, N., Hua, G.: Stylebank: an explicit representation for neural image style transfer. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  20. 20.
    Chen, T.Q., Schmidt, M.: Fast patch-based style transfer of arbitrary style (2016). arXiv preprint: arXiv:1612.04337
  21. 21.
    Jing, Y., Yang, Y., Feng, Z., Ye, J., Song, M.: Neural style transfer: A review (2017). arXiv preprint: arXiv:1705.04058
  22. 22.
    Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)Google Scholar
  23. 23.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)Google Scholar
  24. 24.
    Gatys, L.A., Bethge, M., Hertzmann, A., Shechtman, E.: Preserving color in neural artistic style transfer (2016). arXiv preprint: arXiv:1606.05897
  25. 25.
    Gatys, L.A., Ecker, A.S., Bethge, M., Hertzmann, A., Shechtman, E.: Controlling perceptual factors in neural style transfer (2016). arXiv preprint: arXiv:1611.07865
  26. 26.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks (2016). arXiv preprint: arXiv:1611.07004
  27. 27.
    Zhang, H., et al.: StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks (2016). arXiv preprint: arXiv:1612.03242
  28. 28.
    Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941 (2016)Google Scholar
  29. 29.
    Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)Google Scholar
  30. 30.
    Che, T., Li, Y., Jacob, A.P., Bengio, Y., Li, W.: Mode regularized generative adversarial networks (2016). arXiv preprint: arXiv:1612.02136
  31. 31.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks (2015). arXiv preprint: arXiv:1511.06434
  32. 32.
    Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. arXiv (2016)Google Scholar
  33. 33.
    Sindagi, V.A., Patel, V.M.: Generating high-quality crowd density maps using contextual pyramid CNNs. In: 2017 IEEE International Conference on Computer Vision (ICCV). IEEE (2017)Google Scholar
  34. 34.
    Zhang, H., Sindagi, V., Patel, V.M.: Image de-raining using a conditional generative adversarial network (2017). arXiv preprint: arXiv:1701.05957
  35. 35.
    Xian, W., Sangkloy, P., Lu, J., Fang, C., Yu, F., Hays, J.: TextureGAN: Controlling deep image synthesis with texture patches. arXiv preprint (2018)Google Scholar
  36. 36.
    Zhang, Z., Xie, Y., Yang, L.: Photographic text-to-image synthesis with a hierarchically-nested adversarial network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  37. 37.
    Zhang, H., Patel, V.M.: Density-aware single image de-raining using a multi-stream dense network (2018). arXiv preprint: arXiv:1802.07412
  38. 38.
    Zhang, H., Patel, V.M.: Densely connected pyramid dehazing network. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)Google Scholar
  39. 39.
    Xu, T., et al.: AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. arXiv preprint (2017)Google Scholar
  40. 40.
    Odena, A., Dumoulin, V., Olah, C.: Deconvolution and checkerboard artifacts. Distill (2016)Google Scholar
  41. 41.
    He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_38CrossRefGoogle Scholar
  42. 42.
    Mahendran, A., Vedaldi, A.: Understanding deep image representations by inverting them. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5188–5196 (2015)Google Scholar
  43. 43.
    Zhang, H., Yang, J., Zhang, Y., Huang, T.S.: Non-local kernel regression for image and video restoration. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part III. LNCS, vol. 6313, pp. 566–579. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15558-1_41CrossRefGoogle Scholar
  44. 44.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014). arXiv preprint: arXiv:1409.1556
  45. 45.
    Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part V. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_48CrossRefGoogle Scholar
  46. 46.
    Duck, S.Y.: Painter by numbers (2016). https://www.kaggle.com/c/painter-by-numbers
  47. 47.
    Kingma, D., Ba, J.: Adam: A method for stochastic optimization (2014). arXiv preprint: arXiv:1412.6980
  48. 48.
    Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS Workshop, Number EPFL-CONF-192376 (2011)Google Scholar
  49. 49.
    Paszke, A., et al.: Automatic differentiation in PyTorch (2017)Google Scholar
  50. 50.
    Chen, T., et al.: MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems (2015). arXiv preprint: arXiv:1512.01274
  51. 51.
    Shen, X., et al.: Automatic portrait segmentation for image stylization. In: Computer Graphics Forum, vol. 35, pp. 93–102. Wiley Online Library (2016)Google Scholar
  52. 52.
    Zhang, H., et al.: Context encoding for semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Amazon AIEast Palo AltoUSA
  2. 2.Rutgers UniversityNew BrunswickUSA

Personalised recommendations