Skip to main content

High-Resolution Realistic Image Synthesis from Text Using Iterative Generative Adversarial Network

  • Conference paper
  • First Online:
  • 1416 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11854))

Abstract

Synthesizing high-resolution realistic images from text description using one iteration Generative Adversarial Network (GAN) is difficult without using any additional techniques because mostly the blurry artifacts and mode collapse problems are occurring. To reduce these problems, this paper proposes an Iterative Generative Adversarial Network (iGAN) which takes three iterations to synthesize high-resolution realistic image from their text description. In the \(1^{st}\) iteration, GAN synthesizes a low-resolution \(64 \times 64\) pixels basic shape and basic color image from the text description with less mode collapse and blurry artifacts problems. In the \(2^{nd}\) iteration, GAN takes the result of the \(1^{st}\) iteration and text description again and synthesizes a better resolution \(128 \times 128\) pixels better shape and well color image with very less mode collapse and blurry artifacts problems. In the last iteration, GAN takes the result of the \(2^{nd}\) iteration and text description as well and synthesizes a high-resolution \(256 \times 256\) well shape and clear image with almost no mode collapse and blurry artifacts problems. Our proposed iGAN shows a significant performance on CUB birds and Oxford-102 flowers datasets. Moreover, iGAN improves the inception score and human rank as compare to the other state-of-the-art methods.

This study is funded by the General Program of the National Natural Science Foundation of China (No: 61977029).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Goodfellow, I.J., et al.: Generative adversarial nets. In: 27th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 2672–2680 (2014)

    Google Scholar 

  2. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text-to-image synthesis. In: ICML (2016)

    Google Scholar 

  3. Welinder, P., et al.: Caltech-UCSD Birds 200. California Institute of Technology, CNS-TR-2010-001 (2010)

    Google Scholar 

  4. Nilsback, M.-E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (2008)

    Google Scholar 

  5. Lin, T.-Y., et al.: Microsoft coco (Common objects in context). In: European Conference on Computer Vision (2014)

    Google Scholar 

  6. Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning What and Where to Draw. arXiv:1610.02454v1 [cs.CV] (2016)

  7. Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014, pp. 3686–3693 (2014)

    Google Scholar 

  8. Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. In: CVPR (2017)

    Google Scholar 

  9. Zhang, H., et al.: StackGAN++: realistic image synthesis with stacked generative adversarial networks. arXiv:1710.10916 (2017)

  10. Xu, T., et al.: AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. arXiv: 1711.10485v1 [cs.CV] (2017)

  11. Dash, A., Gamboa, J., Ahmed, S., Liwicki, M., Afzal, M.Z.: TAC-GAN: Text Conditioned Auxiliary Classifier Generative Adversarial Network. arXiv:1703.06412v2 [cs.CV] (2017)

  12. Odena, A., Olah, C., Shlens, J.: Conditional Image Synthesis with Auxiliary Classifier GANs. arXiv:1610.09585v4 [stat.ML] (2017)

  13. Zhang, Z., Xie, Y., Yang, L.: Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network. arXiv:1802.09178v2 [cs.CV] (2018)

  14. Yuan, M., Peng, Y.: Text-to-image Synthesis via Symmetrical Distillation Networks. arXiv:1808.06801v1 [cs.CV] (2018)

  15. Cha, M., Gwon, Y., Kung, H.T.: Adversarial nets with perceptual losses for text-to-image synthesis. arXiv:1708.09321v1 [cs.CV] (2017)

  16. Qiao, T., Zhang, J., Xu, D., Tao, D.: MirrorGAN: Learning Text-to-image Generation by Redescription. arXiv:1903.05854v1 [cs.CL], 14 March 2019

  17. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv: 1411.1784 (2014)

  18. Reed, S., Akata, Z., Schiele, B., Lee, H.: Learning deep representations of fine-grained visual descriptions. In: IEEE Computer Vision and Pattern Recognition (2016)

    Google Scholar 

  19. Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: NIPS (2016)

    Google Scholar 

  20. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xinguo Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ullah, A., Yu, X., Majid, A., Rahman, H.U., Mughal, M.F. (2019). High-Resolution Realistic Image Synthesis from Text Using Iterative Generative Adversarial Network. In: Lee, C., Su, Z., Sugimoto, A. (eds) Image and Video Technology. PSIVT 2019. Lecture Notes in Computer Science(), vol 11854. Springer, Cham. https://doi.org/10.1007/978-3-030-34879-3_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34879-3_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34878-6

  • Online ISBN: 978-3-030-34879-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics