Abstract
Accessing large external DRAM is costly, and poses a challenge to efficiently evaluate data-intensive convolutional neural networks (CNNs) on embedded devices. These external memory accesses can be minimized by exploiting data reuse in on-chip memory. Selecting the combination of code transformations that minimize the external DRAM accesses is however an extremely complex task. In this work a mathematical model is presented to quickly and very precisely evaluate combinations of code transformations on CNNs. An accompanying open source tool is developed which leverages this model to perform automated design space exploration and code generation for CNNs. The correctness of the developed model is demonstrated by measurement of seven neural networks. Results show the transformations selected by the tool can reduce external memory accesses by over an order of magnitude.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). http://www.deeplearningbook.org
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
Horowitz, M.: Computing’s energy problem (and what we can do about it). In: 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), pp. 10–14, February 2014
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014)
Kim, J., Lee, J.K., Lee, K.M.: Accurate image super-resolution using very deep convolutional networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR Oral), June 2016
Low, T.M., Igual, F.D., Smith, T.M., Quintana-Orti, E.S.: Analytical modeling is enough for high-performance BLIS. ACM Trans. Math. Softw. 43(2), 12:1–12:18 (2016)
Motamedi, M., Gysel, P., Ghiasi, S.: Placid: a platform for FPGA-based accelerator creation for DCNNs. ACM Trans. Multimedia Comput. Commun. Appl. 13(4), 62:1–62:21 (2017)
Pareto, V.: Manual of Political Economy. Scholars Book Shelf, Cranbury (1971). https://books.google.nl/books?id=qAC8AAAAIAAJ
Paszke, A., et al.: Automatic differentiation in pytorch (2017)
Peemen, M.: Improving the efficiency of deep convolutional networks. Eindhoven University of Technology (2017). https://pure.tue.nl/ws/portalfiles/portal/77700147/20171012_Peemen.pdf
Pradelle, B., Meister, B., Baskaran, M., Springer, J., Lethin, R.: Polyhedral optimization of tensorflow computation graphs. In: 6th Workshop on Extreme-Scale Programming Tools (ESPT, Associated with SC 2017) (2017)
Ragan-Kelley, J.: Decoupling algorithms from the organization of computation for high performance image processing. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA, June 2014. http://groups.csail.mit.edu/commit/papers/2014/jrkthesis.pdf
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015)
Sandler, M., Howard, A.G., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. CoRR (2018)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Sze, V., Chen, Y., Yang, T., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)
Waeijen, L.: https://github.com/lwaeijen/cnn-demo
Waeijen, L.: https://github.com/lwaeijen/cnn-mapping-tool
Wolf, M.E., Lam, M.S.: A loop transformation theory and an algorithm to maximize parallelism. IEEE Trans. Parallel Distrib. Syst. 2, 452–471 (1991)
Yang, X., et al.: DNN dataflow choice is overrated. CoRR abs/1809.04070 (2018). http://arxiv.org/abs/1809.04070
Acknowledgements
This work is supported by NWO project CPS-P3 (12695).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Waeijen, L., Sioutas, S., He, Y., Peemen, M., Corporaal, H. (2019). Automatic Memory-Efficient Scheduling of CNNs. In: Pnevmatikatos, D., Pelcat, M., Jung, M. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2019. Lecture Notes in Computer Science(), vol 11733. Springer, Cham. https://doi.org/10.1007/978-3-030-27562-4_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-27562-4_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27561-7
Online ISBN: 978-3-030-27562-4
eBook Packages: Computer ScienceComputer Science (R0)