Advertisement

The 2nd YouTube-8M Large-Scale Video Understanding Challenge

  • Joonseok LeeEmail author
  • Apostol (Paul) Natsev
  • Walter Reade
  • Rahul Sukthankar
  • George Toderici
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11132)

Abstract

We hosted the 2nd YouTube-8M Large-Scale Video Understanding Kaggle Challenge and Workshop at ECCV’18, with the task of classifying videos from frame-level and video-level audio-visual features. In this year’s challenge, we restricted the final model size to 1 GB or less, encouraging participants to explore representation learning or better architecture, instead of heavy ensembles of multiple models. In this paper, we briefly introduce the YouTube-8M dataset and challenge task, followed by participants statistics and result analysis. We summarize proposed ideas by participants, including architectures, temporal aggregation methods, ensembling and distillation, data augmentation, and more.

Keywords

YouTube Video Classification Video Understanding 

References

  1. 1.
    Abu-El-Haija, S., et al.: Youtube-8M: A large-scale video classification benchmark (2016). arXiv preprint: arXiv:1609.08675
  2. 2.
    Aliev, V., et al.: Label denoising with large ensembles of heterogeneous neural networks. In: Proceedings of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding (2018)Google Scholar
  3. 3.
    Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: Netvlad: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  4. 4.
    Araujo, A., Negrevergne, B., Chevaleyre, Y., Atif, J.: Training compact deep learning models for video classification using circulant matrices. In: Proceedings of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding (2018)Google Scholar
  5. 5.
    Bober-Irizar, M., Husain, S., Ong, E.J., Bober, M.: Cultivating DNN diversity for large scale video labelling. In: Proceedings of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding (2017)Google Scholar
  6. 6.
    Chen, S., Wang, X., Tang, Y., Chen, X., Wu, Z., Jiang, Y.G.: Aggregating frame-level features for large-scale video classification. In: Proceedings of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding (2017)Google Scholar
  7. 7.
    Cho, C., et al.: Axon AI’s solution to the 2nd Youtube-8M video understanding challenge. In: Proceedings of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding (2018)Google Scholar
  8. 8.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  9. 9.
    Garg, S.: Learning video features for multi-label classification. In: Proceedings of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding (2018)Google Scholar
  10. 10.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015). arXiv preprint: arXiv:1503.02531
  11. 11.
    Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE international conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  12. 12.
    Kim, E.S., et al.: Temporal attention mechanism with conditional inference for large-scale multi-label video classification. In: Proceedings of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding (2018)Google Scholar
  13. 13.
    Kmiec, S., Bae, J.: Learnable pooling methods for video classification. In: Proceedings of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding (2018)Google Scholar
  14. 14.
    Lee, J., Abu-El-Haija, S., Varadarajan, B., Natsev, A.: Collaborative deep metric learning for video understanding. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2018)Google Scholar
  15. 15.
    Li, F., et al.: Temporal modeling approaches for large-scale Youtube-8M video understanding. In: Proceedings of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding (2017)Google Scholar
  16. 16.
    Lin, R., Xiao, J., Fan, J.: NeXtVLAD: an efficient neural network to aggregate frame-level features for large-scale video classification. In: Proceedings of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding (2018)Google Scholar
  17. 17.
    Liu, T., Liu, B.: Constrained-size tensorflow models for Youtube-8M video understanding challenge. In: Proceedings of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding (2018)Google Scholar
  18. 18.
    Miech, A., Laptev, I., Sivic, J.: Learnable pooling with context gating for video classification. In: Proceedings of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding (2017)Google Scholar
  19. 19.
    Na, S., Yu, Y., Lee, S., Kim, J., Kim, G.: Encoding video and label priors for multi-label video classification on Youtube-8M dataset. In: Proceedings of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding (2017)Google Scholar
  20. 20.
    Shin, K., Jeon, J., Lee, S.: Approach for video classification with multi-label on Youtube-8M dataset. In: Proceedings of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding (2018)Google Scholar
  21. 21.
    Skalic, M., Austin, D.: Building a size constrained predictive model for video classification. In: Proceedings of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding (2018)Google Scholar
  22. 22.
    Skalic, M., Pekalski, M., Pan, X.E.: Deep learning methods for efficient large scale video labeling. In: Proceedings of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding (2017)Google Scholar
  23. 23.
    Tang, Y., Zhang, X., Wang, J., Chen, S., Ma, L., Jiang, Y.G.: Non-local netVLAD encoding for video classification. In: Proceedings of the 2nd Workshop on YouTube-8M Large-Scale Video Understanding (2018)Google Scholar
  24. 24.
    Wang, H.D., Zhang, T., Wu, J.: The monkeytyping solution to the Youtube-8M video understanding challenge. In: Proceedings of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding (2017)Google Scholar
  25. 25.
    Zhu, L., Liu, Y., Yang, Y.: UTS submission to Google Youtube-8M challenge 2017. In: Proceedings of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding (2017)Google Scholar
  26. 26.
    Zou, H., Xu, K., Li, J., Zhu, J.: The Youtube-8M kaggle competition: challenges and methods. In: Proceedings of the CVPR Workshop on YouTube-8M Large-Scale Video Understanding (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Joonseok Lee
    • 1
    Email author
  • Apostol (Paul) Natsev
    • 1
  • Walter Reade
    • 1
  • Rahul Sukthankar
    • 1
  • George Toderici
    • 1
  1. 1.Google ResearchMountain ViewUSA

Personalised recommendations