High-Resolution Realistic Image Synthesis from Text Using Iterative Generative Adversarial Network

  • Anwar Ullah
  • Xinguo YuEmail author
  • Abdul Majid
  • Hafiz Ur Rahman
  • M. Farhan Mughal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11854)


Synthesizing high-resolution realistic images from text description using one iteration Generative Adversarial Network (GAN) is difficult without using any additional techniques because mostly the blurry artifacts and mode collapse problems are occurring. To reduce these problems, this paper proposes an Iterative Generative Adversarial Network (iGAN) which takes three iterations to synthesize high-resolution realistic image from their text description. In the \(1^{st}\) iteration, GAN synthesizes a low-resolution \(64 \times 64\) pixels basic shape and basic color image from the text description with less mode collapse and blurry artifacts problems. In the \(2^{nd}\) iteration, GAN takes the result of the \(1^{st}\) iteration and text description again and synthesizes a better resolution \(128 \times 128\) pixels better shape and well color image with very less mode collapse and blurry artifacts problems. In the last iteration, GAN takes the result of the \(2^{nd}\) iteration and text description as well and synthesizes a high-resolution \(256 \times 256\) well shape and clear image with almost no mode collapse and blurry artifacts problems. Our proposed iGAN shows a significant performance on CUB birds and Oxford-102 flowers datasets. Moreover, iGAN improves the inception score and human rank as compare to the other state-of-the-art methods.


Generative Adversarial Network (GAN) Iterative GAN Text-to-image synthesis CUB dataset Oxford-102 dataset Inception score Human rank 


  1. 1.
    Goodfellow, I.J., et al.: Generative adversarial nets. In: 27th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 2672–2680 (2014)Google Scholar
  2. 2.
    Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text-to-image synthesis. In: ICML (2016)Google Scholar
  3. 3.
    Welinder, P., et al.: Caltech-UCSD Birds 200. California Institute of Technology, CNS-TR-2010-001 (2010)Google Scholar
  4. 4.
    Nilsback, M.-E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (2008)Google Scholar
  5. 5.
    Lin, T.-Y., et al.: Microsoft coco (Common objects in context). In: European Conference on Computer Vision (2014)CrossRefGoogle Scholar
  6. 6.
    Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning What and Where to Draw. arXiv:1610.02454v1 [cs.CV] (2016)
  7. 7.
    Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014, pp. 3686–3693 (2014)Google Scholar
  8. 8.
    Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. In: CVPR (2017)Google Scholar
  9. 9.
    Zhang, H., et al.: StackGAN++: realistic image synthesis with stacked generative adversarial networks. arXiv:1710.10916 (2017)
  10. 10.
    Xu, T., et al.: AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. arXiv: 1711.10485v1 [cs.CV] (2017)
  11. 11.
    Dash, A., Gamboa, J., Ahmed, S., Liwicki, M., Afzal, M.Z.: TAC-GAN: Text Conditioned Auxiliary Classifier Generative Adversarial Network. arXiv:1703.06412v2 [cs.CV] (2017)
  12. 12.
    Odena, A., Olah, C., Shlens, J.: Conditional Image Synthesis with Auxiliary Classifier GANs. arXiv:1610.09585v4 [stat.ML] (2017)
  13. 13.
    Zhang, Z., Xie, Y., Yang, L.: Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network. arXiv:1802.09178v2 [cs.CV] (2018)
  14. 14.
    Yuan, M., Peng, Y.: Text-to-image Synthesis via Symmetrical Distillation Networks. arXiv:1808.06801v1 [cs.CV] (2018)
  15. 15.
    Cha, M., Gwon, Y., Kung, H.T.: Adversarial nets with perceptual losses for text-to-image synthesis. arXiv:1708.09321v1 [cs.CV] (2017)
  16. 16.
    Qiao, T., Zhang, J., Xu, D., Tao, D.: MirrorGAN: Learning Text-to-image Generation by Redescription. arXiv:1903.05854v1 [cs.CL], 14 March 2019
  17. 17.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv: 1411.1784 (2014)
  18. 18.
    Reed, S., Akata, Z., Schiele, B., Lee, H.: Learning deep representations of fine-grained visual descriptions. In: IEEE Computer Vision and Pattern Recognition (2016)Google Scholar
  19. 19.
    Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: NIPS (2016)Google Scholar
  20. 20.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Anwar Ullah
    • 1
  • Xinguo Yu
    • 1
    Email author
  • Abdul Majid
    • 1
  • Hafiz Ur Rahman
    • 2
  • M. Farhan Mughal
    • 3
  1. 1.National Engineering Research Center for E-LearningCentral China Normal UniversityWuhanChina
  2. 2.School of Computer ScienceGuangzhou UniversityGuangzhouChina
  3. 3.Tianjin University of Finance and EconomicsTianjinChina

Personalised recommendations