Label Denoising with Large Ensembles of Heterogeneous Neural Networks

  • Pavel OstyakovEmail author
  • Elizaveta LogachevaEmail author
  • Roman SuvorovEmail author
  • Vladimir AlievEmail author
  • Gleb SterkinEmail author
  • Oleg KhomenkoEmail author
  • Sergey I. NikolenkoEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11132)


Despite recent advances in computer vision based on various convolutional architectures, video understanding remains an important challenge. In this work, we present and discuss a top solution for the large-scale video classification (labeling) problem introduced as a Kaggle competition based on the YouTube-8M dataset. We show and compare different approaches to preprocessing, data augmentation, model architectures, and model combination. Our final model is based on a large ensemble of video- and frame-level models but fits into rather limiting hardware constraints. We apply an approach based on knowledge distillation to deal with noisy labels in the original dataset and the recently developed mixup technique to improve the basic models.


Video processing Learning from noisy labels Attention-based models Recurrent neural networks Deep learning 


  1. 1.
    Abu-El-Haija, S., et al.: Youtube-8M: a large-scale video classification benchmark. CoRR abs/1609.08675 (2016)Google Scholar
  2. 2.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR 2009 (2009)Google Scholar
  3. 3.
    Daubechies, I.: Ten lectures on wavelets. In: CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 61 (1992)Google Scholar
  4. 4.
    Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM (JACM) 58(3), 11 (2011)MathSciNetCrossRefGoogle Scholar
  5. 5.
    n01z3: Solution for Google Cloud & Youtube-8M video understanding challenge (2017).
  6. 6.
    Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. In: International Conference on Learning Representations (2018)Google Scholar
  7. 7.
    Miech, A., Laptev, I., Sivic, J.: Learnable pooling with context gating for video classification. arXiv preprint arXiv:1706.06905 (2017)
  8. 8.
    Vaswani, A., et al.: Attention is all you need. In: Guyon, I., Luxburg, U.V., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008. Curran Associates, Inc. (2017)Google Scholar
  9. 9.
    Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)Google Scholar
  10. 10.
    Natarajan, N., Dhillon, I.S., Ravikumar, P.K., Tewari, A.: Learning with noisy labels. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 1196–1204. Curran Associates, Inc. (2013)Google Scholar
  11. 11.
    Jindal, I., Nokleby, M.S., Chen, X.: Learning deep networks from noisy labels with dropout regularization. CoRR abs/1705.03419 (2017)Google Scholar
  12. 12.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
  13. 13.
    Ke, G., et al.: LightGBM: a highly efficient gradient boosting decision tree. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems, vol. 30, pp. 3146–3154. Curran Associates, Inc. (2017)Google Scholar
  14. 14.
    Ciresan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Deep big simple neural nets excel on handwritten digit recognition. CoRR abs/1003.0358 (2010)Google Scholar
  15. 15.
    Perez, L., Wang, J.: The effectiveness of data augmentation in image classification using deep learning. CoRR abs/1712.04621 (2017)Google Scholar
  16. 16.
    Taylor, L., Nitschke, G.: Improving deep learning using generic data augmentation. CoRR abs/1708.06020 (2017)Google Scholar
  17. 17.
    Goyal, P., et al.: Accurate, large minibatch SGD: training imagenet in 1 hour. CoRR abs/1706.02677 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Samsung AI CenterMoscowRussia

Personalised recommendations