Advertisement

Gossiping the Videos: An Embedding-Based Generative Adversarial Framework for Time-Sync Comments Generation

  • Guangyi Lv
  • Tong Xu
  • Qi Liu
  • Enhong ChenEmail author
  • Weidong He
  • Mingxiao An
  • Zhongming Chen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11441)

Abstract

Recent years have witnessed the successful rise of the time-sync “gossiping comment”, or so-called “Danmu” combined with online videos. Along this line, automatic generation of Danmus may attract users with better interactions. However, this task could be extremely challenging due to the difficulties of informal expressions and “semantic gap” between text and videos, as Danmus are usually not straightforward descriptions for the videos, but subjective and diverse expressions. To that end, in this paper, we propose a novel Embedding-based Generative Adversarial (E-GA) framework to generate time-sync video comments with “gossiping” behavior. Specifically, we first model the informal styles of comments via semantic embedding inspired by variational autoencoders (VAE), and then generate Danmus in a generatively adversarial way to deal with the gap between visual and textual content. Extensive experiments on a large-scale real-world dataset demonstrate the effectiveness of our E-GA framework.

Notes

Acknowledgments

This research was partially supported by grants from the National Natural Science Foundation of China (Grant No. 61727809, U1605251, 61672483, and 61703386), the Anhui Provincial Natural Science Foundation (Grant No. 1708085QF140), and the Fundamental Research Funds for the Central Universities (Grant No. WK2150110006).

References

  1. 1.
    Alupului, M., Ames, A.L., Collopy, B.A.M., Pesot, J.F., Pierce, R., Steinmetz, D.C.: Question-answering system. US Patent App. 15/229,361, 5 August 2016Google Scholar
  2. 2.
    Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein GAN. arXiv preprint arXiv:1701.07875 (2017)
  3. 3.
    Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Jozefowicz, R., Bengio, S.: Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349 (2015)
  4. 4.
    Chua, F.C.T., Asur, S.: Automatic summarization of events from social media. In: ICWSM (2013)Google Scholar
  5. 5.
    Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. CoRR (2014)Google Scholar
  6. 6.
    Dai, A.M., Le, Q.V.: Semi-supervised sequence learning. In: NIPS, pp. 3079–3087 (2015)Google Scholar
  7. 7.
    Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a Laplacian pyramid of adversarial networks. In: NIPS, pp. 1486–1494 (2015)Google Scholar
  8. 8.
    Farhadi, A., et al.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15561-1_2CrossRefGoogle Scholar
  9. 9.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)Google Scholar
  10. 10.
    He, M., Ge, Y., Chen, E., Liu, Q., Wang, X.: Exploring the emerging type of comment for online videos: Danmu. ACM Trans. Web (TWEB) 12(1), 1 (2018)CrossRefGoogle Scholar
  11. 11.
    He, M., Ge, Y., Wu, L., Chen, E., Tan, C.: Predicting the popularity of DanMu-enabled videos: a multi-factor view. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9643, pp. 351–366. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-32049-6_22CrossRefGoogle Scholar
  12. 12.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
  13. 13.
    Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
  14. 14.
    Lv, G., Xu, T., Chen, E., Liu, Q., Zheng, Y.: Reading the videos: temporal labeling for crowdsourced time-sync videos based on semantic embedding. In: AAAI, pp. 3000–3006 (2016)Google Scholar
  15. 15.
    Manjunath, B.S., Ohm, J.R., Vasudevan, V.V., Yamada, A.: Color and texture descriptors. IEEE TCSVT 11(6), 703–715 (2001)Google Scholar
  16. 16.
    Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014)
  17. 17.
    Neto, J.L., Freitas, A.A., Kaestner, C.A.A.: Automatic text summarization using a machine learning approach. In: Bittencourt, G., Ramalho, G.L. (eds.) SBIA 2002. LNCS (LNAI), vol. 2507, pp. 205–215. Springer, Heidelberg (2002).  https://doi.org/10.1007/3-540-36127-8_20CrossRefGoogle Scholar
  18. 18.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002)Google Scholar
  19. 19.
    Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015)
  20. 20.
    Rohrbach, M., Qiu, W., Titov, I., Thater, S., Pinkal, M., Schiele, B.: Translating video content to natural language descriptions. In: ICCV, pp. 433–440 (2013)Google Scholar
  21. 21.
    Sohn, K., Yan, X., Lee, H.: Learning structured output representation using deep conditional generative models. In: NIPS, pp. 3483–3491 (2015)Google Scholar
  22. 22.
    Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR, pp. 3156–3164 (2015)Google Scholar
  23. 23.
    Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: NIPS, pp. 809–817 (2013)Google Scholar
  24. 24.
    Wang, Z., et al.: Chinese poetry generation with planning based neural network. COLING (2016)Google Scholar
  25. 25.
    Wu, B., Zhong, E., Tan, B., Horner, A., Yang, Q.: Crowdsourced time-sync video tagging using temporal and personalized topic modeling. In: SIGKDD, pp. 721–730. ACM (2014)Google Scholar
  26. 26.
    Yu, L., Zhang, W., Wang, J., Yu, Y.: SeqGAN: sequence generative adversarial nets with policy gradient. In: AAAI (2017)Google Scholar
  27. 27.
    Zhang, K., et al.: Image-enhanced multi-level sentence representation net for natural language inference. In: ICDM, pp. 747–756 (2018)Google Scholar
  28. 28.
    Zhang, Y., Gan, Z., Carin, L.: Generating text via adversarial training (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Guangyi Lv
    • 1
  • Tong Xu
    • 1
  • Qi Liu
    • 1
  • Enhong Chen
    • 1
    Email author
  • Weidong He
    • 1
  • Mingxiao An
    • 1
  • Zhongming Chen
    • 2
  1. 1.Anhui Province Key Laboratory of Big Data Analysis and Application, School of Computer Science and TechnologyUniversity of Science and Technology of ChinaHefeiChina
  2. 2.Quantum LabResearch Institute of OPPOShanghaiChina

Personalised recommendations