GOHAG: GANs Orchestration for Human Actions Generation

  • Aziz SiyaevEmail author
  • Geun-Sik Jo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12033)


Generative Adversarial Networks (GANs) made a huge contribution to the development of content creation technologies. Important place in this advancement takes video generation due to the need for human animation applications, automatic trailer or movie generation. Therefore, taking advantage of various GANs, we proposed own method for human movement video generation GOHAG: GANs Orchestration for Human Actions Generations. GOHAG is an orchestra of three GANs, where Poses generation GAN (PGAN) creates a sequence of poses, Poses Optimization GAN (POGAN) optimizes them, and Frames generation GAN (FGAN) attaches texture for the sequence, creating a video. The proposed method generates a smooth and plausible video of high-quality and showed potentials among modern techniques.


Generative Adversarial Network Poses Generation Video generation 



This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2017-0-01642) supervised by the IITP (Institute for Information & communications Technology Promotion).


  1. 1.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS (2014)Google Scholar
  2. 2.
    Saito, M., Matsumoto, E., Saito, S.: Temporal generative adversarial nets with singular value clipping. In: ICCV (2017)Google Scholar
  3. 3.
    Marwah, T., Mittal, G., Balasubramanian, V.N.: Attentive semantic video generation using captions. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1435–1443. IEEE (2017)Google Scholar
  4. 4.
    Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 613–621. Curran Associates, Inc. (2016).
  5. 5.
    Tulyakov, S., Liu, M.Y., Yang, X., Kautz, J.: MoCoGAN: decomposing motion and content for video generation. arXiv preprint arXiv:1707.04993 (2017)
  6. 6.
    Yang, C., Wang, Z., Zhu, X., Huang, C., Shi, J., Lin, D.: Pose guided human video generation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 204–219. Springer, Cham (2018). Scholar
  7. 7.
    Cai, H., Bai, C., Tai, Y.-W., Tang, C.-K.: Deep video generation, prediction and completion of human action sequences. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 374–390. Springer, Cham (2018). Scholar
  8. 8.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
  9. 9.
    Zhu, J., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)Google Scholar
  10. 10.
    Wang, T., Liu, M., Zhu, J., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: CVPR (2018)Google Scholar
  11. 11.
    Theis, L., Oord, A.V.D., Bethge, M.: A note on the evaluation of generative models. In: ICLR (2016)Google Scholar
  12. 12.
    Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
  13. 13.
    Cao, Z., Hidalgo, G., Šimon, T., Wei, S., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)Google Scholar
  14. 14.
    Horé, A., Ziou, D.: Image quality metrics: PSNR vs. SSIM. In: ICPR (2010)Google Scholar
  15. 15.
    Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. arXiv preprint arXiv:1606.03498 (2016)
  16. 16.
    Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local nash equilibrium. In: NIPS (2017)Google Scholar
  17. 17.
    Gerhard, H.E., Wichmann, F.A., Bethge, M.: How sensitive is the human visual system to the local statistics of natural images? PLoS Comput. Biol. 9(1), e1002873 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Inha UniversityIncheonRepublic of Korea

Personalised recommendations