Advertisement

Region-Semantics Preserving Image Synthesis

  • Kang-Jun Liu
  • Tsu-Jui Fu
  • Shan-Hung WuEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11364)

Abstract

We study the problem of region-semantics preserving (RSP) image synthesis. Given a reference image and a region specification R, our goal is to train a model that is able to generate realistic and diverse images, each preserving the same semantics as that of the reference image within the region R. This problem is challenging because the model needs to (1) understand and preserve the marginal semantics of the reference region; i.e., the semantics excluding that of any subregion; and (2) maintain the compatibility of any synthesized region with the marginal semantics of the reference region. In this paper, we propose a novel model, called the fast region-semantics preserver (Fast-RSPer), for the RSP image synthesis problem. The Fast-RSPer uses a pre-trained GAN generator and a pre-trained deep feature extractor to generate images without undergoing a dedicated training phase. This makes it particularly useful for the interactive applications. We conduct extensive experiments using the real-world datasets and the results show that Fast-PSPer can synthesize realistic, diverse RSP images efficiently.

References

  1. 1.
    Agarwala, A., et al.: Interactive digital photomontage. In: ACM Transactions on Graphics (TOG), vol. 23, pp. 294–302. ACM (2004)Google Scholar
  2. 2.
    Ballé, J., Laparra, V., Simoncelli, E.P.: Density modeling of images using a generalized normalization transformation. arXiv preprint arXiv:1511.06281 (2015)
  3. 3.
    Berthelot, D., Schumm, T., Metz, L.: Began: boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717 (2017)
  4. 4.
    Dosovitskiy, A., Tobias Springenberg, J., Brox, T.: Learning to generate chairs with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1538–1546 (2015)Google Scholar
  5. 5.
    Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)
  6. 6.
    Gatys, L.A., Ecker, A.S., Bethge, M.: Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks. arXiv preprint arXiv:1505.07376 12 (2015)
  7. 7.
    Goodfellow, I., et al.: Generative adversarial Nets. In: Advances in Neural Information Processing systems, pp. 2672–2680 (2014)Google Scholar
  8. 8.
    Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: Draw: a recurrent neural network for image generation. In: International Conference on Machine Learning, pp. 1462–1471 (2015)Google Scholar
  9. 9.
    Güçlütürk, Y., Güçlü, U., van Lier, R., van Gerven, M.A.J.: Convolutional sketch inversion. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 810–824. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46604-0_56CrossRefGoogle Scholar
  10. 10.
    Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, A.: Improved training of Wasserstein GANs. arXiv preprint arXiv:1704.00028 (2017)
  11. 11.
    Hertzmann, A., Jacobs, C.E., Oliver, N., Curless, B., Salesin, D.H.: Image analogies. In: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 327–340. ACM (2001)Google Scholar
  12. 12.
    Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. (TOG) 36(4), 107 (2017)CrossRefGoogle Scholar
  13. 13.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. arXiv preprint arXiv:1611.07004 (2016)
  14. 14.
    Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114 (2013)
  15. 15.
    Kyprianidis, J.E., Collomosse, J., Wang, T., Isenberg, T.: State of the “art": a taxonomy of artistic stylization techniques for images and video. IEEE Trans. Vis. Comput. Graph. 19(5), 866–885 (2013)CrossRefGoogle Scholar
  16. 16.
    Larsen, A.B.L., Sønderby, S.K., Larochelle, H., Winther, O.: Autoencoding beyond pixels using a learned similarity metric. In: International Conference on Machine Learning, pp. 1558–1566 (2016)Google Scholar
  17. 17.
    Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802 (2016)
  18. 18.
    Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 609–616. ACM (2009)Google Scholar
  19. 19.
    Mirza, M., Osindero, S.: Conditional generative adversarial Nets. arXiv preprint arXiv:1411.1784 (2014)
  20. 20.
    Nguyen, A., Dosovitskiy, A., Yosinski, J., Brox, T., Clune, J.: Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In: Advances in Neural Information Processing Systems, pp. 3387–3395 (2016)Google Scholar
  21. 21.
    Nguyen, A., Yosinski, J., Bengio, Y., Dosovitskiy, A., Clune, J.: Plug & play generative networks: Conditional iterative generation of images in latent space. arXiv preprint arXiv:1612.00005 (2016)
  22. 22.
    van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al.: Conditional image generation with PixelCNN decoders. In: Advances in Neural Information Processing Systems, pp. 4790–4798 (2016)Google Scholar
  23. 23.
    van den Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: International Conference on Machine Learning, pp. 1747–1756 (2016)Google Scholar
  24. 24.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
  25. 25.
    Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069 (2016)Google Scholar
  26. 26.
    Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. In: Artificial Intelligence and Statistics, pp. 448–455 (2009)Google Scholar
  27. 27.
    Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: Advances in Neural Information Processing Systems, pp. 2234–2242 (2016)Google Scholar
  28. 28.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
  29. 29.
    Sønderby, C.K., Raiko, T., Maaløe, L., Sønderby, S.K., Winther, O.: Ladder variational autoencoders. In: Advances in Neural Information Processing Systems, pp. 3738–3746 (2016)Google Scholar
  30. 30.
    Toderici, G., et al.: Full resolution image compression with recurrent neural networks. arXiv preprint arXiv:1608.05148 (2016)
  31. 31.
    Yan, X., Yang, J., Sohn, K., Lee, H.: Attribute2Image: conditional image generation from visual attributes. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 776–791. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_47CrossRefGoogle Scholar
  32. 32.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10590-1_53CrossRefGoogle Scholar
  33. 33.
    Zhou, T., Tulsiani, S., Sun, W., Malik, J., Efros, A.A.: View synthesis by appearance flow. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 286–301. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_18CrossRefGoogle Scholar
  34. 34.
    Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 597–613. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46454-1_36CrossRefGoogle Scholar
  35. 35.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593 (2017)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.National Tsing Hua UniversityHsinchuTaiwan, R.O.C.

Personalised recommendations