Advertisement

Generating Synthetic Video Sequences by Explicitly Modeling Object Motion

  • S. Palazzo
  • C. SpampinatoEmail author
  • P. D’Oro
  • D. Giordano
  • M. Shah
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11130)

Abstract

Recent GAN-based video generation approaches model videos as the combination of a time-independent scene component and a time-varying motion component, thus factorizing the generation problem into generating background and foreground separately. One of the main limitations of current approaches is that both factors are learned by mapping one source latent space to videos, which complicates the generation task as a single data point must be informative of both background and foreground content. In this paper we propose a GAN framework for video generation that, instead, employs two latent spaces in order to structure the generative process in a more natural way: (1) a latent space to generate the static visual content of a scene (background), which remains the same for the whole video, and (2) a latent space where motion is encoded as a trajectory between sampled points and whose dynamics are modeled through an RNN encoder (jointly trained with the generator and the discriminator) and then mapped by the generator to visual objects’ motion. Performance evaluation showed that our approach is able to control effectively the generation process as well as to synthesize more realistic videos than state-of-the-art methods.

References

  1. 1.
    Goodfellow, I., et al.: Generative adversarial nets. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 27, pp. 2672–2680. Curran Associates, Inc., New York (2014)Google Scholar
  2. 2.
    Denton, E.L., Chintala, S., Szlam, A., Fergus, R.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 1486–1494. Curran Associates, Inc., New York (2015)Google Scholar
  3. 3.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. In: ICLR (2016)Google Scholar
  4. 4.
    Zhang, H., et al.: StackGAN: text to photo-realistic image synthesis with stacked generative adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  5. 5.
    Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning, International Convention Centre, Sydney, Australia, PMLR, 06–11 Aug 2017, vol. 70, pp. 214–223Google Scholar
  6. 6.
    Mao, X., Li, Q., Xie, H., Lau, R.Y., Wang, Z., Paul Smolley, S.: Least squares generative adversarial networks. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  7. 7.
    Roth, K., Lucchi, A., Nowozin, S., Hofmann, T.: Stabilizing training of generative adversarial networks through regularization. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 2018–2028. Curran Associates, Inc., New York (2017)Google Scholar
  8. 8.
    Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  9. 9.
    Vondrick, C., Pirsiavash, H., Torralba, A.: Generating videos with scene dynamics. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 613–621. Curran Associates, Inc., New York (2016)Google Scholar
  10. 10.
    Saito, M., Matsumoto, E., Saito, S.: Temporal generative adversarial nets with singular value clipping. In: The IEEE International Conference on Computer Vision (ICCV), October 2017Google Scholar
  11. 11.
    Shoemake, K.: Animating rotation with quaternion curves. SIGGRAPH Comput. Graph. 19(3), 245–254 (1985)CrossRefGoogle Scholar
  12. 12.
    Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition (CVPR) (2015)Google Scholar
  13. 13.
    Salimans, T., et al.: Improved techniques for training GANs. In: Lee, D.D., Sugiyama, M., Luxburg, U.V., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems 29, pp. 2234–2242. Curran Associates, Inc., New York (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • S. Palazzo
    • 1
  • C. Spampinato
    • 1
    • 2
    Email author
  • P. D’Oro
    • 1
  • D. Giordano
    • 1
  • M. Shah
    • 2
  1. 1.Pattern Recognition and Computer Vision (PeRCeiVe) LabUniversity of CataniaCataniaItaly
  2. 2.Center for Research in Computer VisionUniversity of Central FloridaOrlandoUSA

Personalised recommendations