A Heterogeneous and Reconfigurable Embedded Architecture for Energy-Efficient Execution of Convolutional Neural Networks

Lübeck, Konstantin; Bringmann, Oliver

doi:10.1007/978-3-030-18656-2_20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11479))

Included in the following conference series:

International Conference on Architecture of Computing Systems

1172 Accesses
4 Citations
2 Altmetric

Abstract

Machine learning based convolutional neural networks (CNN) are becoming increasingly popular for identification tasks like image classification or speech recognition. However, CNNs have high memory and computational demands which makes it challenging to implement them on cost-efficient and energy-autonomous hardware. To cope with this challenge we present a heterogeneous and reconfigurable embedded architecture implemented on an inexpensive and widely available entry-level system on chip (SoC). Our architecture combines an ARM CPU and a coarse-grained reconfigurable architecture (CGRA) which execute a CNN in parallel to reach a higher energy-efficiency. Our results show up to 130% higher performance and 78% better energy-efficiency compared with an embedded Nvidia GPU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

LeCun, Y., Buttou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G. E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS 2012), Lake Tahoe, NV, pp. 1097–1105 (2012)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015). https://doi.org/10.1038/nature14539
Article Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv:1709.01507 (2017)
ImageNet Large Scale Visual Recognition Challenge 2017 Results (ILSVRC2017). http://image-net.org/challenges/LSVRC/2017/results. Accessed 19 Nov 2018
LeCun, Y., Cortes, C.: MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist. Accessed 29 Oct 2018
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv:1408.5093 (2014)
Nvidia cuDNN. https://developer.nvidia.com/cudnn. Accessed 2 Nov 2018
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems. http://tensorflow.org. Accessed 14 Nov 2018
Paszke, A., et al.: Automatic differentiation in PyTorch. In: Proceedings of the NIPS 2017 Workshop Autodiff, Long Beach, CA (2017)
Google Scholar
Nvidia Titan RTX. https://www.nvidia.com/en-us/titan/titan-rtx. Accessed 20 Feb 2019
Chen, T., et al.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2014), Salt Lake City, UT, pp. 269–284 (2014). https://doi.org/10.1145/2541940.2541967
Tanomoto, M., Takamaeda-Yamazaki, S., Yao, J., Nakashima, Y.: A CGRA-based approach for accelerating convolutional neural networks. In: Proceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC 2015), Turin, pp. 73–80 (2015). https://doi.org/10.1109/MCSoC.2015.41
Shi, R., et al.: A locality aware convolutional neural networks accelerator. In: Proceedings of the 2015 Euromicro Conference on Digital System Design, Funchal, pp. 591–598 (2015). https://doi.org/10.1109/DSD.2015.70
Fan, X., Li, H., Cao, W., Wang, L.: DT-CGRA: dual-track coarse-grained reconfigurable architecture for stream applications. In: Proceedings of the 2016 26th International Conference on Field Programmable Logic and Applications (FPL), Lausanne, pp. 1–9 (2016). https://doi.org/10.1109/FPL.2016.7577309
Jafri, S.M.A.H., Hemani, A., Kolin, P., Abbas, N.: MOCHA: morphable locality and compression aware architecture for convolutional neural networks. In: Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, pp. 276–286 (2007). https://doi.org/10.1109/IPDPS.2017.59
Chen, Y.H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 137–138 (2017). https://doi.org/10.1109/JSSC.2016.2616357
Article Google Scholar
Zhao, B., Wang, M., Liu, M.: An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet. IEICE Electron. Express 14(15), 20170595 (2017). https://doi.org/10.1587/elex.14.20170595
Shin, D., Lee, J., Lee, J., Yoo, H. J.: DNPU: an 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks. In: Proceedings of the in 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 240–241, San Francisco, CA (2017). https://doi.org/10.1109/ISSCC.2017.7870350
Du, L., et al.: A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. IEEE Trans. Circuits Syst. I Regular Papers 65(1), 198–208 (2018). https://doi.org/10.1109/TCSI.2017.2735490
Article Google Scholar
Chakradhar, S., Sankaradas, M., Jakkula, V., Cadambi, S.: A dynamically configurable coprocessor for convolutional neural networks. In: Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA 2010), Saint-Malo, pp. 247–257 (2010). https://doi.org/10.1145/1815961.1815993
Zhang, C., Li, P., Sun, G., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2015), Monterey, CA, pp. 161–170 (2015). https://doi.org/10.1145/2684746.2689060
Qiu, J., et al.: Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2016), Monterey, CA, pp. 26–35 (2016). https://doi.org/10.1145/2847263.2847265
Gokhale, V., Zaidy, A., Chang, A.X.M., Culurciello, E.: Snowflake: an efficient hardware accelerator for convolutional neural networks. In: Proceedings of the 2017 IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, MD, pp. 1–4 (2017). https://doi.org/10.1109/ISCAS.2017.8050809
Hartenstein, R.: A decade of reconfigurable computing: a visionary retrospective. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition 2001 (DATE 2001), Munich, pp. 642–649 (2001). https://doi.org/10.1109/DATE.2001.915091
Xilinx: Zynq UltraScale+ Device Technical Reference Manual, UG1085 v1.7 (2017)
Google Scholar
Oppold, T., Schweizer, T., Oliveira, J.F., Eisenhardt, S., Kuhn, T., Rosenstiel, W.: CRC - concepts and evaluation of processor-like reconfigurable architectures. Inf. Technol. IT 49(3), 157–164 (2007). https://doi.org/10.1524/itit.2007.49.3.157
Article Google Scholar
Lübeck, K., Morgenstern, D., Schweizer, T., Peterson D., Rosenstiel W., Bringmann O.: Neues Konzept zur Steigerung der Zuverlässigkeit einer ARM-basierten Prozessorarchitektur unter Verwendung eines CGRAs. In: 19. Workshop Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen (MBMV), Freiburg, pp. 46–58 (2016). https://doi.org/10.6094/UNIFR/10617
Hennessy, J.L., Patterson, D.A.: Computer Architecture, 5th edn. Morgan Kaufmann Publisher Inc., San Francisco (2011)
MATH Google Scholar
Dagum, L., Menon, R.: OpenMP: an industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 45–55 (1998). https://doi.org/10.1109/99.660313
Article Google Scholar
Pico-CNN. https://github.com/ekut-es/pico-cnn. Accessed 27 Feb 2019
CRC Configurator. https://github.com/ekut-es/crc_configurator. Accessed 27 Feb 2019
Jia, Y.: Training LeNet on MNIST with Caffe. http://caffe.berkeleyvision.org/gathered/examples/mnist.html. Accessed 20 Feb 2019
System Management Interface Forum, PMBus Power System Management Protocol Specification Part II - Command Language, Revision 1.2 (2010)
Google Scholar
Nvidia, Whitepaper NVIDIA Tegra K1 A New Era in Mobile Computing, V1.0 (2013)
Google Scholar

Download references

Acknowledgments

This work has been partially funded by the Stiftung Industrieforschung through the scholarship for master’s theses and the German Federal Ministry of Education and Research (BMBF) under grant number 16ES0876 (GENIAL!).

Author information

Authors and Affiliations

Department of Computer Science, University of Tübingen, Tübingen, Germany
Konstantin Lübeck & Oliver Bringmann

Authors

Konstantin Lübeck
View author publications
You can also search for this author in PubMed Google Scholar
Oliver Bringmann
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Konstantin Lübeck .

Editor information

Editors and Affiliations

Technical University of Denmark, Lyngby, Denmark
Martin Schoeberl
Technical University of Darmstadt, Darmstadt, Germany
Christian Hochberger
Airbus Defence and Space GmbH, Taufkirchen, Germany
Sascha Uhrig
University of Hanover, Hanover, Germany
Jürgen Brehm
Otto-von-Guericke University, Magdeburg, Germany
Thilo Pionteck

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lübeck, K., Bringmann, O. (2019). A Heterogeneous and Reconfigurable Embedded Architecture for Energy-Efficient Execution of Convolutional Neural Networks. In: Schoeberl, M., Hochberger, C., Uhrig, S., Brehm, J., Pionteck, T. (eds) Architecture of Computing Systems – ARCS 2019. ARCS 2019. Lecture Notes in Computer Science(), vol 11479. Springer, Cham. https://doi.org/10.1007/978-3-030-18656-2_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-18656-2_20
Published: 25 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18655-5
Online ISBN: 978-3-030-18656-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics