Memory Bandwidth and Energy Efficiency Optimization of Deep Convolutional Neural Network Accelerators

Nie, Zikai; Li, Zhisheng; Wang, Lei; Guo, Shasha; Dou, Qiang

doi:10.1007/978-981-13-2423-9_2

Zikai Nie¹⁰,
Zhisheng Li¹⁰,
Lei Wang¹⁰,
Shasha Guo¹⁰ &
…
Qiang Dou¹⁰

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 908))

Included in the following conference series:

Conference on Advanced Computer Architecture

874 Accesses

Abstract

Deep convolutional neural networks (DNNs) achieve state-of-the-art accuracy but at the cost of massive computation and memory operations. Although highly-parallel devices effectively meet the requirements of computation, energy efficiency is still a tough nut.

In this paper, we present two novel computation sequences, \(N\!H\!W\!C_{fine}\) and \(N\!H\!W\!C_{coarse}\), for the DNN accelerators. Then we combine two computation sequences with appropriate data layouts. The proposed modes enable continuous memory access patterns and reduce the number of memory accesses, which is achieved by leveraging and transforming the local data reuse of weights and feature maps in high-dimensional convolutions.

Experiments with various convolutional layers show that the proposed modes made up of computing sequences and data layouts are more energy efficient than the baseline mode on various networks. The reduction for total energy consumption is up to 4.10\(\times \). The reduction for the off-chip memory access latency is up to 5.11\(\times \).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Albericio, J., et al.: Bit-pragmatic deep neural network computing. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 382–394. ACM (2017)
Google Scholar
Alwani, M., Chen, H., Ferdman, M., Milder, P.: Fused-layer CNN accelerators. In: 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 1–12. IEEE (2016)
Google Scholar
Bastoul, C.: Code generation in the polyhedral model is easier than you think. In: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pp. 7–16. IEEE Computer Society (2004)
Google Scholar
Chen, T., et al.: Diannao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: ACM Sigplan Notices, vol. 49, pp. 269–284. ACM (2014)
Google Scholar
Chen, Y.-H., Emer, J., Sze, V.: Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks. In: 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), pp. 367–379. IEEE (2016)
Google Scholar
Chen, Y., et al.: Dadiannao: a machine-learning supercomputer. In: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609–622. IEEE Computer Society (2014)
Google Scholar
Cireşan, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22(12), 3207–3220 (2010)
Article Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)
Google Scholar
Deng, L., et al.: Recent advances in deep learning for speech research at microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8604–8608. IEEE (2013)
Google Scholar
Ding, C., et al.: C IR CNN: accelerating and compressing deep neural networks using block-circulant weight matrices. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 395–408. ACM (2017)
Google Scholar
Fowers, J., Brown, G., Cooke, P., Stitt, G.: A performance and energy comparison of FPGAs, GPUS, and multicores for sliding-window applications. In: Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pp. 47–56. ACM (2012)
Google Scholar
Gupta, S., Agrawal, A., Gopalakrishnan, K., Narayanan, P.: Deep learning with limited numerical precision. In: Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 1737–1746 (2015)
Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Li, C., Yang, Y., Feng, M., Chakradhar, S., Zhou, H.: Optimizing memory efficiency for deep convolutional neural networks on GPUS. In: SC16: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 633–644. IEEE (2016)
Google Scholar
Li, Z., et al.: Laius: an 8-bit fixed-point CNN hardware inference engine. In: 2017 15th IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA), pp. 143–150. IEEE (2017)
Google Scholar
Reagen, B., et al.: Minerva: enabling low-power, highly-accurate deep neural network accelerators. In: Proceedings of the 43rd International Symposium on Computer Architecture, pp. 267–278. IEEE Press (2016)
Google Scholar
Rosenfeld, P., Cooper-Balis, E., Jacob, B.: Dramsim2: a cycle accurate memory system simulator. IEEE Comput. Arch. Lett. 10(1), 16–19 (2011)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Wang, C., Gong, L., Qi, Y., Li, X., Xie, Y., Zhou, X.: Dlau: a scalable deep learning accelerator unit on FPGA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36(3), 513–517 (2017)
Google Scholar
Wei, X., et al.: Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In: 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), pp. 1–6. IEEE (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

National University of Defense Technology, Changsha, China
Zikai Nie, Zhisheng Li, Lei Wang, Shasha Guo & Qiang Dou

Authors

Zikai Nie
View author publications
You can also search for this author in PubMed Google Scholar
Zhisheng Li
View author publications
You can also search for this author in PubMed Google Scholar
Lei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shasha Guo
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Dou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Wang .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Chao Li
National University of Defense Technology, Changsha, China
Junjie Wu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nie, Z., Li, Z., Wang, L., Guo, S., Dou, Q. (2018). Memory Bandwidth and Energy Efficiency Optimization of Deep Convolutional Neural Network Accelerators. In: Li, C., Wu, J. (eds) Advanced Computer Architecture. ACA 2018. Communications in Computer and Information Science, vol 908. Springer, Singapore. https://doi.org/10.1007/978-981-13-2423-9_2

Download citation

DOI: https://doi.org/10.1007/978-981-13-2423-9_2
Published: 13 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2422-2
Online ISBN: 978-981-13-2423-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)