Advertisement

Diverse Conditional Image Generation by Stochastic Regression with Latent Drop-Out Codes

  • Yang He
  • Bernt Schiele
  • Mario Fritz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11220)

Abstract

Recent advances in Deep Learning and probabilistic modeling have led to strong improvements in generative models for images. On the one hand, Generative Adversarial Networks (GANs) have contributed a highly effective adversarial learning procedure, but still suffer from stability issues. On the other hand, Conditional Variational Auto-Encoders (CVAE) models provide a sound way of conditional modeling but suffer from mode-mixing issues. Therefore, recent work has turned back to simple and stable regression models that are effective at generation but give up on the sampling mechanism and the latent code representation. We propose a novel and efficient stochastic regression approach with latent drop-out codes that combines the merits of both lines of research. In addition, a new training objective increases coverage of the training distribution leading to improvements over the state of the art in terms of accuracy as well as diversity.

Keywords

Image generation Improving diversity One-to-many mapping Nonparametrics 

References

  1. 1.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  2. 2.
    Sohn, K., Lee, H., Yan, X.: Learning structured output representation using deep conditional generative models. In: NIPS (2015)Google Scholar
  3. 3.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
  4. 4.
    Zhu, J.Y., et al.: Toward multimodal image-to-image translation. In: NIPS (2017)Google Scholar
  5. 5.
    Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)Google Scholar
  6. 6.
    Chen, Q., Koltun, V.: Photographic image synthesis with cascaded refinement networks. In: ICCV (2017)Google Scholar
  7. 7.
    Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: ICML (2016)Google Scholar
  8. 8.
    Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: ICCV (2017)Google Scholar
  9. 9.
    Wang, X., Gupta, A.: Generative image modeling using style and structure adversarial networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 318–335. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_20CrossRefGoogle Scholar
  10. 10.
    Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: f: feature learning by inpainting. In: CVPR (2016)Google Scholar
  11. 11.
    Ledig, C., et al.: Photo-realistic single image super-resolution using a generative adversarial network. arXiv preprint arXiv:1609.04802 (2016)
  12. 12.
    Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46475-6_43CrossRefGoogle Scholar
  13. 13.
    Dosovitskiy, A., Brox, T.: Generating images with perceptual similarity metrics based on deep networks. In: Advances in Neural Information Processing Systems, pp. 658–666 (2016)Google Scholar
  14. 14.
    Guzman-Rivera, A., Batra, D., Kohli, P.: Multiple choice learning: learning to produce multiple structured outputs. In: NIPS (2012)Google Scholar
  15. 15.
    Li, Y., Fang, C., Yang, J., Wang, Z., Lu, X., Yang, M.H.: Diversified texture synthesis with feed-forward networks (2017)Google Scholar
  16. 16.
    Liu, C., Shum, H.Y., Freeman, W.T.: Face hallucination: theory and practice. IJCV (2007)Google Scholar
  17. 17.
    Hays, J., Efros, A.A.: Scene completion using millions of photographs. In: SIGGRAPH (2007)Google Scholar
  18. 18.
    Efros, A.A., Leung, T.K.: Texture synthesis by non-parametric sampling. In: ICCV (1999)Google Scholar
  19. 19.
    Efros, A.A., Freeman, W.T.: Image quilting for texture synthesis and transfer. In: SIGGRAPH (2001)Google Scholar
  20. 20.
    Bansal, A., Sheikh, Y., Ramanan, D.: PixelNN: example-based image synthesis. In: ICLR (2018)Google Scholar
  21. 21.
    Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR (2014)Google Scholar
  22. 22.
    Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Object detectors emerge in deep scene CNNs. In: ICLR (2015)Google Scholar
  23. 23.
    Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations. In: CVPR (2017)Google Scholar
  24. 24.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature (1986)Google Scholar
  25. 25.
    Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia (2014)Google Scholar
  26. 26.
    Parkhi, O.M., Vedaldi, A., Zisserman, A., Jawahar, C.: Cats and dogs. In: CVPR (2012)Google Scholar
  27. 27.
    Bansal, A., Chen, X., Russell, B., Ramanan, A.G., et al.: Pixelnet: representation of the pixels, by the pixels, and for the pixels. arXiv preprint arXiv:1702.06506 (2017)
  28. 28.
    Learned-Miller, E., Huang, G.B., RoyChowdhury, A., Li, H., Hua, G.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. In: Advances in Face Detection and Facial Image Analysis (2016)Google Scholar
  29. 29.
    Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. (2016)Google Scholar
  30. 30.
    Zhang, Z., Luo, P., Loy, C.C., Tang, X.: Facial landmark detection by deep multi-task learning. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 94–108. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_7CrossRefGoogle Scholar
  31. 31.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
  32. 32.
    Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. TIP (2004)Google Scholar
  33. 33.
    Zhang, L., Zhang, L., Mou, X., Zhang, D.: FSIM: a feature similarity index for image quality assessment. TIP (2011)Google Scholar
  34. 34.
    Wang, X., Fouhey, D., Gupta, A.: Designing deep networks for surface normal estimation. In: CVPRGoogle Scholar
  35. 35.
    Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 499–515. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46478-7_31CrossRefGoogle Scholar
  36. 36.
    Oh, S.J., Fritz, M., Schiele, B.: Adversarial image perturbation for privacy protection-a game theory perspective. In: ICCV (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Max Planck Institute for Informatics, Saarland Informatics CampusSaarbrückenGermany

Personalised recommendations