Advertisement

Constrained Optimization Based Low-Rank Approximation of Deep Neural Networks

  • Chong LiEmail author
  • C. J. Richard Shi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11214)

Abstract

We present COBLA—Constrained Optimization Based Low-rank Approximation—a systematic method of finding an optimal low-rank approximation of a trained convolutional neural network, subject to constraints in the number of multiply-accumulate (MAC) operations and the memory footprint. COBLA optimally allocates the constrained computation resources into each layer of the approximated network. The singular value decomposition of the network weight is computed, then a binary masking variable is introduced to denote whether a particular singular value and the corresponding singular vectors are used in low-rank approximation. With this formulation, the number of the MAC operations and the memory footprint are represented as linear constraints in terms of the binary masking variables. The resulted 0–1 integer programming problem is approximately solved by sequential quadratic programming. COBLA does not introduce any hyperparameter. We empirically demonstrate that COBLA outperforms prior art using the SqueezeNet and VGG-16 architecture on the ImageNet dataset.

Keywords

Low-rank approximation Resource allocation Constrained optimization Integer relaxiation 

Notes

Acknowledgment

The authors would like to thank the anonymous reviewers, particularly Reviewer 3, for their highly constructive advice. This work is supported by an Intel/Semiconductor Research Corporation Ph.D. Fellowship.

References

  1. 1.
    Alvarez, J.M., Salzmann, M.: Compression-aware training of deep networks. In: Neural Information Processing Systems (2017). http://papers.nips.cc/paper/6687-compression-aware-training-of-deep-networks.pdf
  2. 2.
  3. 3.
    Dai, Y.H.: Convergence properties of the BFGS algoritm. SIAM J. Optim. 13(3), 693–701 (2002).  https://doi.org/10.1137/S1052623401383455MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Dai, Y.H., Schittkowski, K.: A sequential quadratic programming algorithm with non-monotone line search. Pac. J. Optim. 4, 335–351 (2008)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Gavish, M., Donoho, D.L.: The optimal hard threshold for singular values is \(4/\sqrt{3}\). IEEE Trans. Inf. Theory 60(8), 5040–5053 (2014).  https://doi.org/10.1109/TIT.2014.2323359. http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=6846297MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Ge, R., Huang, F., Jin, C., Yuan, Y.: Escaping from saddle points - online stochastic gradient for tensor decomposition. J. Mach. Learn. Res. 40 (2015)Google Scholar
  7. 7.
    Gower, R.M., Goldfarb, D., Richtarik, P.: Stochastic block BFGS: squeezing more curvature out of data. In: International Conference on Machine Learning (2016).  https://doi.org/10.1016/j.camwa.2005.08.006MathSciNetCrossRefGoogle Scholar
  8. 8.
    Han, S., Mao, H., Dally, W.J.: Deep compression - compressing deep neural networks with pruning, trained quantization and huffman coding. In: International Conference on Learning Representations (2016)Google Scholar
  9. 9.
    Ioannou, Y., Robertson, D., Shotton, J., Cipolla, R., Criminisi, A.: Training CNNs with low-rank filters for efficient image classification. In: International Conference on Learning Representations (2016). http://arxiv.org/abs/1511.06744
  10. 10.
    Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic-only inference. ArXiv (2017). http://arxiv.org/abs/1712.05877
  11. 11.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. In: British Machine Vision Conference (BMVC) (2014).  https://doi.org/10.5244/C.28.88, http://arxiv.org/abs/1405.3866
  12. 12.
    Keutzer, F.N.I., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Kurt: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)0.5 MB model size. In: International Conference on Learning Representations (2017).  https://doi.org/10.1007/978-3-319-24553-9, http://arxiv.org/abs/1602.07360Google Scholar
  13. 13.
    Kim, Y.D., Park, E., Yoo, S., Choi, T., Yang, L., Shin, D.: Compression of deep convolutional neural networks for fast and low power mobile applications. In: International Conference on Learning Representations (2016). http://arxiv.org/abs/1511.06530
  14. 14.
    Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)Google Scholar
  15. 15.
    Lebedev, V., Ganin, Y., Rakhuba, M., Oseledets, I., Lempitsky, V.: Speeding-up convolutional neural networks using fine-tuned CP-decomposition. In: International Conference on Learning Representations (2015). http://arxiv.org/abs/1412.6553
  16. 16.
    Lebedev, V., Lempitsky, V.: Fast ConvNets using group-wise brain damage. In: Conference on Computer Vision and Pattern Recognition (2016).  https://doi.org/10.1109/CVPR.2016.280, http://arxiv.org/abs/1506.02515
  17. 17.
    Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations (2013).  https://doi.org/10.1109/ASRU.2015.7404828, http://arxiv.org/abs/1312.4400
  18. 18.
    Mokhtari, A.: Efficient methods for large-scale empirical risk minimization. Ph.D. thesis, University of Pennsylvania (2017)Google Scholar
  19. 19.
    MOSEK: the MOSEK optimization toolbox for MATLAB manual. Technical report (2017)Google Scholar
  20. 20.
    Nakajima, S., Tomioka, R., Sugiyama, M., Babacan, S.D.: Condition for perfect dimensionality recovery by variational bayesian PCA. J. Mach. Learn. Reas. 16, 3757–3811 (2016)MathSciNetzbMATHGoogle Scholar
  21. 21.
    Novikov, A., Vetrov, D., Podoprikhin, D., Osokin, A.: Tensorizing neural networks. In: Neural Information Processing Systems (2015), http://arxiv.org/pdf/1509.06569v1.pdf
  22. 22.
    Nowak, I.: Relaxation and Decomposition Methods for Mixed Integer Nonlinear Programming. Birkhäuser Basel (2005).  https://doi.org/10.1007/3-7643-7374-1
  23. 23.
    Olga, R., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (2015)Google Scholar
  24. 24.
    Park, E., Ahn, J., Yoo, S.: Weighted-entropy-based quantization for deep neural networks. In: Conference on Computer Vision and Pattern Recognition (2017).  https://doi.org/10.1109/CVPR.2017.761
  25. 25.
    Raghavan, P., Tompson, C.: Randomized rounding: a technique for provably good algorithms and algorithmic proofs. Combinatorica 7(4), 365–374 (1987).  https://doi.org/10.1007/BF02579324MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations (2015)Google Scholar
  27. 27.
    Tai, C., Xiao, T., Zhang, Y., Wang, X., E, W.: Convolutional neural networks with low-rank regularization. In: International Conference on Learning Representations (2016). http://arxiv.org/abs/1511.06067
  28. 28.
    Yu, X., Liu, T., Wang, X., Tao, D.: On compressing deep models by low rank and sparse decomposition. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2017).  https://doi.org/10.1109/CVPR.2017.15
  29. 29.
    Zhang, J., Mitliagkas, I., Ré, C.: YellowFin and the art of momentum tuning. arXiv preprint (2017). http://arxiv.org/abs/1706.03471
  30. 30.
    Zhang, X., Zou, J., Ming, X., He, K., Sun, J.: Efficient and accurate approximations of nonlinear convolutional networks. In: Conference on Computer Vision and Pattern Recognition (2015). http://arxiv.org/abs/1411.4229
  31. 31.
    Zhou, G.: Rank-constrained optimization: a Riemannian manifold approach. Ph.D. thesis, Florida State University (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.University of WashingtonSeattleUSA

Personalised recommendations