Data-Driven Sparse Structure Selection for Deep Neural Networks

  • Zehao HuangEmail author
  • Naiyan Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11220)


Deep convolutional neural networks have liberated its extraordinary power on various tasks. However, it is still very challenging to deploy state-of-the-art models into real-world applications due to their high computational complexity. How can we design a compact and effective network without massive experiments and expert knowledge? In this paper, we propose a simple and effective framework to learn and prune deep models in an end-to-end manner. In our framework, a new type of parameter – scaling factor is first introduced to scale the outputs of specific structures, such as neurons, groups or residual blocks. Then we add sparsity regularizations on these factors, and solve this optimization problem by a modified stochastic Accelerated Proximal Gradient (APG) method. By forcing some of the factors to zero, we can safely remove the corresponding structures, thus prune the unimportant parts of a CNN. Comparing with other structure selection methods that may need thousands of trials or iterative fine-tuning, our method is trained fully end-to-end in one training pass without bells and whistles. We evaluate our method, Sparse Structure Selection with several state-of-the-art CNNs, and demonstrate very promising results with adaptive depth and width selection. Code is available at:


Sparse Model acceleration Deep network structure learning 


  1. 1.
    Alvarez, J.M., Salzmann, M.: Learning the number of neurons in deep networks. In: NIPS (2016)Google Scholar
  2. 2.
    Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning. In: ICLR (2017)Google Scholar
  3. 3.
    Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imag. Sci. 2(1), 183–202 (2009)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Bengio, Y., Boulanger-Lewandowski, N., Pascanu, R.: Advances in optimizing recurrent networks. In: ICASSP (2013)Google Scholar
  5. 5.
    Chen, T., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. In: NIPS Workshop (2015)Google Scholar
  6. 6.
    Courbariaux, M., Hubara, I., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1. In: NIPS (2016)Google Scholar
  7. 7.
    Denton, E.L., Zaremba, W., Bruna, J., LeCun, Y., Fergus, R.: Exploiting linear structure within convolutional networks for efficient evaluation. In: NIPS (2014)Google Scholar
  8. 8.
    Guo, Y., Yao, A., Chen, Y.: Dynamic network surgery for efficient dnns. In: NIPS (2016)Google Scholar
  9. 9.
    Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: NIPS (2015)Google Scholar
  10. 10.
    Hassibi, B., Stork, D.G., et al.: Second order derivatives for network pruning: Optimal brain surgeon. In: NIPS (1993)Google Scholar
  11. 11.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)Google Scholar
  12. 12.
    He, Kaiming, Zhang, Xiangyu, Ren, Shaoqing, Sun, Jian: Identity mappings in deep residual networks. In: Leibe, Bastian, Matas, Jiri, Sebe, Nicu, Welling, Max (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). Scholar
  13. 13.
    He, Y., Zhang, X., Sun, J.: Channel pruning for accelerating very deep neural networks. In: ICCV (2017)Google Scholar
  14. 14.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS Workshop (2014)Google Scholar
  15. 15.
    Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
  16. 16.
    Hu, H., Peng, R., Tai, Y.W., Tang, C.K.: Network trimming: a data-driven neuron pruning approach towards efficient deep architectures. arXiv:1607.03250 (2016)
  17. 17.
    Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.: Densely connected convolutional networks. In: CVPR (2017)Google Scholar
  18. 18.
    Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: alexnet-level accuracy with 50x fewer parameters and 0.5 mb model size (2016)Google Scholar
  19. 19.
    Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML (2015)Google Scholar
  20. 20.
    Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. In: BMVC (2014)Google Scholar
  21. 21.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Technical Report (2009)Google Scholar
  22. 22.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS (2012)Google Scholar
  23. 23.
    LeCun, Y., Denker, J.S., Solla, S.A., Howard, R.E., Jackel, L.D.: Optimal brain damage. In: NIPS (1990)Google Scholar
  24. 24.
    Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.P.: Pruning filters for efficient ConvNets. In: ICLR (2017)Google Scholar
  25. 25.
    Liu, B., Wang, M., Foroosh, H., Tappen, M., Pensky, M.: Sparse convolutional neural networks. In: CVPR (2015)Google Scholar
  26. 26.
    Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: ICCV (2017)Google Scholar
  27. 27.
    Luo, J.H., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. In: ICCV (2017)Google Scholar
  28. 28.
    Mariet, Z., Sra, S.: Diversity networks. In: ICLR (2016)Google Scholar
  29. 29.
    Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference. In: ICLR (2017)Google Scholar
  30. 30.
    Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends\(\textregistered \) Optim. 1(3), 127–239 (2014)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: ECCV (2016)Google Scholar
  32. 32.
    Real, E., et al.: Large-scale evolution of image classifiers. In: ICML (2017)Google Scholar
  33. 33.
    Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: FitNets: hints for thin deep nets. In: ICLR (2015)Google Scholar
  34. 34.
    Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Sergey, Z., Nikos, K.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. In: ICLR (2017)Google Scholar
  36. 36.
    Shen, F., Gan, R., Zeng, G.: Weighted residuals for very deep networks. In: ICSAI (2016)Google Scholar
  37. 37.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)Google Scholar
  38. 38.
    Srinivas, S., Babu, R.V.: Learning neural network architectures using backpropagation. In: BMVC (2016)Google Scholar
  39. 39.
    Srivastava, R.K., Greff, K., Schmidhuber, J.: Highway networks. In: ICML (2015)Google Scholar
  40. 40.
    Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: ICML (2013)Google Scholar
  41. 41.
    Veit, A., Wilber, M.J., Belongie, S.: Residual networks behave like ensembles of relatively shallow networks. In: NIPS (2016)Google Scholar
  42. 42.
    Wang, R.J., Li, X., Ao, S., Ling, C.X.: Pelee: a real-time object detection system on mobile devices. In: ICLR Workshop (2018)Google Scholar
  43. 43.
    Wen, W., Wu, C., Wang, Y., Chen, Y., Li, H.: Learning structured sparsity in deep neural networks. In: NIPS (2016)Google Scholar
  44. 44.
    Wu, J., Leng, C., Wang, Y., Hu, Q., Cheng, J.: Quantized convolutional neural networks for mobile devices. In: CVPR (2016)Google Scholar
  45. 45.
    Xiangyu, Z., Xinyu, Z., Mengxiao, L., Jian, S.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: BMVC (2016)Google Scholar
  46. 46.
    Xie, L., Yuille, A.: Genetic CNN. In: ICCV (2017)Google Scholar
  47. 47.
    Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)Google Scholar
  48. 48.
    Ye, J., Lu, X., Lin, Z., Wang, J.Z.: Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In: ICLR (2018)Google Scholar
  49. 49.
    Zhang, X., Zou, J., Ming, X., He, K., Sun, J.: Efficient and accurate approximations of nonlinear convolutional networks. In: CVPR (2015)Google Scholar
  50. 50.
    Zhou, H., Alvarez, J.M., Porikli, F.: Less is more: Towards compact cnns. In: ECCV (2016)Google Scholar
  51. 51.
    Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: ICLR (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.TuSimpleBeijingChina

Personalised recommendations