Skip to main content

Accelerating CNNs Using Optimized Scheduling Strategy

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11336))

Abstract

Convolutional neural networks (CNNs) have a wide range of applications in image and video recognition, recommender systems and natural language processing. But CNNs are computationally intensive, and its computational cost is hard to accept. In order to speed up the calculations, people focus on optimizing convolution that account for most of the proportion of CNNs’ operation. So, many algorithms have been proposed to accelerate the operation of convolution layers. However, each algorithm has its advantages and disadvantages, and there is no one algorithm that can handle all situations. In this paper, we examine the performance of various algorithms in GPU environment. By building a customized CNN model, we have fully explored the impact of the neural structure on the performance of algorithms, including inference/training speed, and memory consumption. In addition to the algorithms, we also focus on how their implementations in GPU environment affect their performance. Finally, we summarize the characteristics of each algorithm., and design a strategy to assigns the appropriate implementation for different convolutional layers in CNNs. With our strategy, we can make AlexNet run 1.2x to 2.8x faster than other strategies in GPU environment. This work has very important meaning for understanding these algorithms and may provide insights for further optimizations of the architecture of GPUs and accelerators.

This work is supported by the National Natural Science Foundation of China (No. 61672526) and Research Project of NUDT (ZK17-03-06).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)

    Article  Google Scholar 

  2. Sze, V., et al.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)

    Article  Google Scholar 

  3. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(2), 84 (2012)

    Article  Google Scholar 

  4. Simard, P., Lecun, Y., Denker, J.S.: Efficient pattern recognition using a new transformation distance. In: Advances in Neural Information Processing Systems (NIPS 1992), pp. 50–58 (1992)

    Google Scholar 

  5. Chetlur, S., et al.: cuDNN: efficient primitives for deep learning. Computer Science (2014)

    Google Scholar 

  6. Mathieu, M., Henaff, M., Lecun, Y.: Fast training of convolutional networks through FFTs. Eprint Arxiv (2013)

    Google Scholar 

  7. Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks, pp. 4013–4021. Computer Science (2015)

    Google Scholar 

  8. Cheng, J., Grossman, M., Mckercher, T.: Professional CUDA C Programming. Wiley, New York (2014)

    Google Scholar 

  9. Jia, Y., et al.: Caffe: Convolutional Architecture for Fast Feature Embedding, pp. 675–678 (2014)

    Google Scholar 

  10. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53

    Chapter  Google Scholar 

  11. Krizhevsky, A.: cuda-convnet2 (2014). https://github.com/akrizhevsky/cuda-convnet2/

  12. NVIDIA: CUDNN User Guide (2017). https://developer.nvidia.com

  13. Chen, T., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Statistics (2015)

    Google Scholar 

  14. Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a Matlab-like environment for machine learning. In: BigLearn, NIPS Workshop (2011)

    Google Scholar 

  15. Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition, pp. 1–9. IEEE (2015)

    Google Scholar 

  16. Lecun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (2014)

    Article  Google Scholar 

  17. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)

    Google Scholar 

  18. He, K., et al.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)

    Google Scholar 

  19. Vasilache, N., Johnson, J., Mathieu, M., et al.: Fast convolutional nets with FBFFT: a GPU performance evaluation (2014)

    Google Scholar 

  20. Li, X., et al.: Performance analysis of GPU-based convolutional neural networks. In: International Conference on Parallel Processing, pp. 67–76. IEEE (2016)

    Google Scholar 

  21. Kim, H., et al.: Performance analysis of CNN frameworks for GPUs. In: IEEE International Symposium on PERFORMANCE Analysis of Systems and Software, pp. 55–64. IEEE (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheng Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, R., Ma, S., Li, W., Guo, Y. (2018). Accelerating CNNs Using Optimized Scheduling Strategy. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11336. Springer, Cham. https://doi.org/10.1007/978-3-030-05057-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-05057-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-05056-6

  • Online ISBN: 978-3-030-05057-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics