Advertisement

Training Compact Deep Learning Models for Video Classification Using Circulant Matrices

  • Alexandre AraujoEmail author
  • Benjamin NegrevergneEmail author
  • Yann ChevaleyreEmail author
  • Jamal AtifEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11132)

Abstract

In real world scenarios, model accuracy is hardly the only factor to consider. Large models consume more memory and are computationally more intensive, which make them difficult to train and to deploy, especially on mobile devices. In this paper, we build on recent results at the crossroads of Linear Algebra and Deep Learning which demonstrate how imposing a structure on large weight matrices can be used to reduce the size of the model. Building on these results, we propose very compact models for video classification based on state-of-the-art network architectures such as Deep Bag-of-Frames, NetVLAD and NetFisherVectors. We then conduct thorough experiments using the large YouTube-8M video classification dataset. As we will show, the circulant DBoF embedding achieves an excellent trade-off between size and accuracy.

Keywords

Deep learning Computer vision Structured matrices Circulant matrices 

Notes

Acknowledgement

This work was granted access to the OpenPOWER prototype from GENCI-IDRIS under the Preparatory Access AP010610510 made by GENCI. We would like to thank the staff of IDRIS who was really available for the duration of this work, Abdelmalek Lamine and Tahar Nguira, interns at Wavestone for their work on circulant matrices. Finally, we would also like to thank Wavestone to support this research.

References

  1. 1.
    Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/. Software available from tensorflow.org
  2. 2.
    Abu-El-Haija, S., et al.: YouTube-8M: a large-scale video classification benchmark. arXiv:1609.08675 (2016). https://arxiv.org/pdf/1609.08675v1.pdf
  3. 3.
    Arandjelović, R., Gronat, P., Torii, A., Pajdla, T., Sivic, J.: NetVLAD: CNN architecture for weakly supervised place recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (2016)Google Scholar
  4. 4.
    Buciluǎ, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 535–541. ACM (2006)Google Scholar
  5. 5.
    Chen, W., Wilson, J.T., Tyree, S., Weinberger, K.Q., Chen, Y.: Compressing neural networks with the hashing trick. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, ICML 2015, vol. 37, pp. 2285–2294. JMLR.org (2015). http://dl.acm.org/citation.cfm?id=3045118.3045361
  6. 6.
    Cheng, Y., Yu, F.X., Feris, R.S., Kumar, S., Choudhary, A., Chang, S.F.: An exploration of parameter redundancy in deep networks with circulant projections. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2857–2865, December 2015Google Scholar
  7. 7.
    Collins, M.D., Kohli, P.: Memory bounded deep convolutional networks. CoRR abs/1412.1442 (2014)Google Scholar
  8. 8.
    Courbariaux, M., Bengio, Y., David, J.P.: BinaryConnect: training deep neural networks with binary weights during propagations. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS 2015, vol. 2, pp. 3123–3131. MIT Press, Cambridge (2015). http://dl.acm.org/citation.cfm?id=2969442.2969588
  9. 9.
    Dai, B., Zhu, C., Guo, B., Wipf, D.: Compressing neural networks using the variational information bottleneck. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol. 80, pp. 1143–1152. PMLR, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018. http://proceedings.mlr.press/v80/dai18d.html
  10. 10.
    Denil, M., Shakibi, B., Dinh, L., Ranzato, M.A., de Freitas, N.: Predicting parameters in deep learning. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2148–2156. Curran Associates, Inc. (2013). http://papers.nips.cc/paper/5025-predicting-parameters-in-deep-learning.pdf
  11. 11.
    Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, ICML 2015, vol. 37, pp. 1737–1746. JMLR.org (2015). http://dl.acm.org/citation.cfm?id=3045118.3045303
  12. 12.
    Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: International Conference on Learning Representations (ICLR) (2016)Google Scholar
  13. 13.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS Deep Learning and Representation Learning Workshop (2015). http://arxiv.org/abs/1503.02531
  14. 14.
    Huhtanen, M., Perämäki, A.: Factoring matrices into the product of circulant and diagonal matrices. J. Fourier Anal. Appl. 21(5), 1018–1033 (2015).  https://doi.org/10.1007/s00041-015-9395-0MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. CoRR abs/1405.3866 (2014)Google Scholar
  16. 16.
    Jégou, H., Douze, M., Schmid, C., Pérez, P.: Aggregating local descriptors into a compact image representation. In: CVPR 2010 - 23rd IEEE Conference on Computer Vision and Pattern Recognition, pp. 3304–3311. IEEE Computer Society, San Francisco, June 2010.  https://doi.org/10.1109/CVPR.2010.5540039. https://hal.inria.fr/inria-00548637
  17. 17.
    Jiang, Y.G., Wu, Z., Wang, J., Xue, X., Chang, S.F.: Exploiting feature and class relationships in video categorization with regularized deep neural networks. IEEE Trans. Patt. Anal. Mach. Intell. 40(2), 352–364 (2018).  https://doi.org/10.1109/TPAMI.2017.2670560CrossRefGoogle Scholar
  18. 18.
    Jordan, M.I., Jacobs, R.A.: Hierarchical mixtures of experts and the EM algorithm. In: Proceedings of 1993 International Conference on Neural Networks (IJCNN-93-Nagoya, Japan), vol. 2, pp. 1339–1344, October 1993.  https://doi.org/10.1109/IJCNN.1993.716791
  19. 19.
    Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)Google Scholar
  20. 20.
    Li, F., et al.: Temporal modeling approaches for large-scale YouTube-8M video understanding. CoRR abs/1707.04555 (2017)Google Scholar
  21. 21.
    Lin, J., Rao, Y., Lu, J., Zhou, J.: Runtime neural pruning. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems 30, pp. 2181–2191. Curran Associates, Inc. (2017). http://papers.nips.cc/paper/6813-runtime-neural-pruning.pdf
  22. 22.
    Liu, B., Wang, M., Foroosh, H., Tappen, M., Penksy, M.: Sparse convolutional neural networks. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 806–814, June 2015.  https://doi.org/10.1109/CVPR.2015.7298681
  23. 23.
    Mellempudi, N., Kundu, A., Mudigere, D., Das, D., Kaul, B., Dubey, P.: Ternary neural networks with fine-grained quantization. CoRR abs/1705.01462 (2017)Google Scholar
  24. 24.
    Miech, A., Laptev, I., Sivic, J.: Learnable pooling with context gating for video classification. CoRR abs/1706.06905 (2017)Google Scholar
  25. 25.
    Moczulski, M., Denil, M., Appleyard, J., de Freitas, N.: ACDC: a structured efficient linear layer. arXiv preprint arXiv:1511.05946 (2015)
  26. 26.
    Müller-Quade, J., Aagedal, H., Beth, T., Schmid, M.: Algorithmic design of diffractive optical systems for information processing. Phys. D Nonlinear Phenom. 120(1–2), 196–205 (1998)CrossRefGoogle Scholar
  27. 27.
    Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, June 2007.  https://doi.org/10.1109/CVPR.2007.383266
  28. 28.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part IV. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46493-0_32CrossRefGoogle Scholar
  29. 29.
    Schmid, M., Steinwandt, R., Müller-Quade, J., Rötteler, M., Beth, T.: Decomposing a matrix into circulant and diagonal factors. Linear Algebra Appl. 306(1–3), 131–143 (2000)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Sindhwani, V., Sainath, T., Kumar, S.: Structured transforms for small-footprint deep learning. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28, pp. 3088–3096. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5869-structured-transforms-for-small-footprint-deep-learning.pdf
  31. 31.
    Skalic, M., Pekalski, M., Pan, X.E.: Deep learning methods for efficient large scale video labeling. arXiv preprint arXiv:1706.04572 (2017)
  32. 32.
    Vybíral, J.: A variant of the johnson-lindenstrauss lemma for circulant matrices. J. Funct. Anal. 260(4), 1096–1105 (2011).  https://doi.org/10.1016/j.jfa.2010.11.014. http://www.sciencedirect.com/science/article/pii/S0022123610004507MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    Wang, H., Zhang, T., Wu, J.: The monkeytyping solution to the YouTube-8M video understanding challenge. CoRR abs/1706.05150 (2017)Google Scholar
  34. 34.
    Yang, Z., et al.: Deep fried convnets. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 1476–1483, December 2015.  https://doi.org/10.1109/ICCV.2015.173
  35. 35.
    Yu, X., Liu, T., Wang, X., Tao, D.: On compressing deep models by low rank and sparse decomposition. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 67–76, July 2017.  https://doi.org/10.1109/CVPR.2017.15
  36. 36.
    Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694–4702 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.PSL, Université Paris-Dauphine, LAMSADE, CNRS, UMR 7243ParisFrance
  2. 2.WavestoneParisFrance

Personalised recommendations