Advertisement

Approach for Video Classification with Multi-label on YouTube-8M Dataset

  • Kwangsoo ShinEmail author
  • Junhyeong Jeon
  • Seungbin Lee
  • Boyoung Lim
  • Minsoo Jeong
  • Jongho Nang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11132)

Abstract

Video traffic is increasing at a considerable rate due to the spread of personal media and advancements in media technology. Accordingly, there is a growing need for techniques to automatically classify moving images. This paper use NetVLAD and NetFV models and the Huber loss function for video classification problem and YouTube-8M dataset to verify the experiment. We tried various attempts according to the dataset and optimize hyperparameters, ultimately obtain a GAP score of 0.8668.

Keywords

Video classification Large-scale video Multi-label 

Notes

Acknowledgement

This work was partly supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (2017-0-01772, Development of QA system for video story understanding to pass Video Turing Test), Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (2017-0-01781, Data Collection and Automatic Tuning System Development for the Video Understanding), and Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No. 2017-0-00271, Development of Archive Solution and Content Management Platform).

Supplementary material

478824_1_En_29_MOESM1_ESM.pdf (554 kb)
Supplementary material 1 (pdf 553 KB)

References

  1. 1.
    Arandjelovic, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5297–5307 (2016)Google Scholar
  2. 2.
    Campos Camunez, V., Jou, B., Giró Nieto, X., Torres Viñals, J., Chang, S.F.: Skip RNN: learning to skip state updates in recurrent neural networks. In: Proceedings of the Sixth International Conference on Learning Representations, Monday April 30–Thursday 3 May 2018, Vancouver Convention Center, Vancouver, pp. 1–17 (2018)Google Scholar
  3. 3.
    Cho, K., Van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: Encoder-decoder approaches (2014). arXiv preprint: arXiv:1409.1259
  4. 4.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  5. 5.
    Huber, P.J.: Robust statistics. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science, pp. 1248–1251. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3304–3311. IEEE (2010)Google Scholar
  7. 7.
    Kim, Y.: Convolutional neural networks for sentence classification (2014). arXiv preprint: arXiv:1408.5882
  8. 8.
    Miech, A., Laptev, I., Sivic, J.: Learnable pooling with Context Gating for video classification. ArXiv e-prints (2017)Google Scholar
  9. 9.
    Na, S., Yu, Y., Lee, S., Kim, J., Kim, G.: Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset. ArXiv e-prints (2017)Google Scholar
  10. 10.
    Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringSogang UniversitySeoulSouth Korea

Personalised recommendations