Advertisement

Diverse Image-to-Image Translation via Disentangled Representations

  • Hsin-Ying LeeEmail author
  • Hung-Yu Tseng
  • Jia-Bin Huang
  • Maneesh Singh
  • Ming-Hsuan Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11205)

Abstract

Image-to-image translation aims to learn the mapping between two visual domains. There are two main challenges for many applications: (1) the lack of aligned training pairs and (2) multiple possible outputs from a single input image. In this work, we present an approach based on disentangled representation for producing diverse outputs without paired training images. To achieve diversity, we propose to embed images onto two spaces: a domain-invariant content space capturing shared information across domains and a domain-specific attribute space. Using the disentangled features as inputs greatly reduces mode collapse. To handle unpaired training data, we introduce a novel cross-cycle consistency loss. Qualitative results show that our model can generate diverse and realistic images on a wide range of tasks. We validate the effectiveness of our approach through extensive evaluation.

Notes

Acknowledgements

This work is supported in part by the NSF CAREER Grant #1149783, the NSF Grant #1755785, and gifts from Verisk, Adobe and Nvidia.

References

  1. 1.
    Almahairi, A., Rajeswar, S., Sordoni, A., Bachman, P., Courville, A.: Augmented cyclegan: learning many-to-many mappings from unpaired data. arXiv preprint arXiv:1802.10151 (2018)
  2. 2.
    Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. In: ICML (2017)Google Scholar
  3. 3.
    Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., Krishnan, D.: Unsupervised pixel-level domain adaptation with generative adversarial networks. In: CVPR (2017)Google Scholar
  4. 4.
    Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., Erhan, D.: Domain separation networks. In: NIPS (2016)Google Scholar
  5. 5.
    Cao, J., et al.: DiDA: disentangled synthesis for domain adaptation. arXiv preprint arXiv:1805.08019 (2018)
  6. 6.
    Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In: ICCV (2017)Google Scholar
  7. 7.
    Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., Abbeel, P.: InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In: NIPS (2016)Google Scholar
  8. 8.
    Cheung, B., Livezey, J.A., Bansal, A.K., Olshausen, B.A.: Discovering hidden factors of variation in deep networks. In: ICLR Workshop (2015)Google Scholar
  9. 9.
    Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR, vol. 1711 (2018)Google Scholar
  10. 10.
    Denton, E.L., Birodkar, V.: Unsupervised learning of disentangled representations from video. In: NIPS (2017)Google Scholar
  11. 11.
    Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: ICML (2015)Google Scholar
  12. 12.
    Ganin, Y., et al.: Domain-adversarial training of neural networks. JMLR 17, 1–35 (2016)MathSciNetzbMATHGoogle Scholar
  13. 13.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  14. 14.
    Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012. LNCS, vol. 7724, pp. 548–562. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-37331-2_42CrossRefGoogle Scholar
  15. 15.
    Hoffman, J., et al.: CyCADA: cycle-consistent adversarial domain adaptation. In: ICML (2018)Google Scholar
  16. 16.
    Huang, X., Liu, M.Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: ECCV (2018)Google Scholar
  17. 17.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)Google Scholar
  18. 18.
    Kim, T., Cha, M., Kim, H., Lee, J., Kim, J.: Learning to discover cross-domain relations with generative adversarial networks. In: ICML (2017)Google Scholar
  19. 19.
    Kingma, D.P., Rezende, D., Mohamed, S.J., Welling, M.: Semi-supervised learning with deep generative models. In: NIPS (2014)Google Scholar
  20. 20.
    Lai, W.S., Huang, J.B., Ahuja, N., Yang, M.H.: Deep laplacian pyramid networks for fast and accurate superresolution. In: CVPR (2017)Google Scholar
  21. 21.
    Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 577–593. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_35CrossRefGoogle Scholar
  22. 22.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  23. 23.
    Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR (2017)Google Scholar
  24. 24.
    Li, Y., Huang, J.-B., Ahuja, N., Yang, M.-H.: Deep joint image filtering. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 154–169. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_10CrossRefGoogle Scholar
  25. 25.
    Liu, M.Y., Breuel, T., Kautz, J.: Unsupervised image-to-image translation networks. In: NIPS (2017)Google Scholar
  26. 26.
    Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)Google Scholar
  27. 27.
    Ma, L., Jia, X., Georgoulis, S., Tuytelaars, T., Van Gool, L.: Exemplar guided unsupervised image-to-image translation. arXiv preprint arXiv:1805.11145 (2018)
  28. 28.
    Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. In: ICLR Workshop (2016)Google Scholar
  29. 29.
    Mathieu, M., Zhao, J., Sprechmann, P., Ramesh, A., LeCun, Y.: Disentangling factors of variation in deep representation using adversarial training. In: NIPS (2016)Google Scholar
  30. 30.
    Murez, Z., Kolouri, S., Kriegman, D., Ramamoorthi, R., Kim, K.: Image to image translation for domain adaptation. In: CVPR (2018)Google Scholar
  31. 31.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016)Google Scholar
  32. 32.
    Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: ICML (2016)Google Scholar
  33. 33.
    Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: CVPR (2017)Google Scholar
  34. 34.
    Sun, B., Feng, J., Saenko, K.: Return of frustratingly easy domain adaptation. In: AAAI (2016)Google Scholar
  35. 35.
    Taigman, Y., Polyak, A., Wolf, L.: Unsupervised cross-domain image generation. In: ICLR (2017)Google Scholar
  36. 36.
    Tsai, Y.H., Hung, W.C., Schulter, S., Sohn, K., Yang, M.H., Chandraker, M.: Learning to adapt structured output space for semantic segmentation. In: CVPR (2018)Google Scholar
  37. 37.
    Tzeng, E., Hoffman, J., Zhang, N., Saenko, K., Darrell, T.: Deep domain confusion: maximizing for domain invariance. arXiv preprint arXiv:1412.3474 (2014)
  38. 38.
    Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: NIPS (2016)Google Scholar
  39. 39.
    Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: CVPR (2018)Google Scholar
  40. 40.
    Wohlhart, P., Lepetit, V.: Learning descriptors for object recognition and 3D pose estimation. In: CVPR (2015)Google Scholar
  41. 41.
    Yi, Z., Zhang, H.R., Tan, P., Gong, M.: DualGAN: unsupervised dual learning for image-to-image translation. In: ICCV (2017)Google Scholar
  42. 42.
    Yu, A., Grauman, K.: Fine-grained visual comparisons with local learning. In: CVPR (2014)Google Scholar
  43. 43.
    Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46487-9_40CrossRefGoogle Scholar
  44. 44.
    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep networks as a perceptual metric. In: CVPR (2018)Google Scholar
  45. 45.
    Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)Google Scholar
  46. 46.
    Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: NIPS (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Hsin-Ying Lee
    • 1
    Email author
  • Hung-Yu Tseng
    • 1
  • Jia-Bin Huang
    • 2
  • Maneesh Singh
    • 3
  • Ming-Hsuan Yang
    • 1
    • 4
  1. 1.University of CaliforniaMercedUSA
  2. 2.Virginia TechBlacksburgUSA
  3. 3.Verisk AnalyticsJersey CityUSA
  4. 4.Google CloudMercedUSA

Personalised recommendations