Abstract
Machine learning based convolutional neural networks (CNN) are becoming increasingly popular for identification tasks like image classification or speech recognition. However, CNNs have high memory and computational demands which makes it challenging to implement them on cost-efficient and energy-autonomous hardware. To cope with this challenge we present a heterogeneous and reconfigurable embedded architecture implemented on an inexpensive and widely available entry-level system on chip (SoC). Our architecture combines an ARM CPU and a coarse-grained reconfigurable architecture (CGRA) which execute a CNN in parallel to reach a higher energy-efficiency. Our results show up to 130% higher performance and 78% better energy-efficiency compared with an embedded Nvidia GPU.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
LeCun, Y., Buttou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998). https://doi.org/10.1109/5.726791
Krizhevsky, A., Sutskever, I., Hinton, G. E.: ImageNet classification with deep convolutional neural networks. In: Proceedings of the 25th International Conference on Neural Information Processing Systems (NIPS 2012), Lake Tahoe, NV, pp. 1097–1105 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015). https://doi.org/10.1038/nature14539
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. arXiv:1709.01507 (2017)
ImageNet Large Scale Visual Recognition Challenge 2017 Results (ILSVRC2017). http://image-net.org/challenges/LSVRC/2017/results. Accessed 19 Nov 2018
LeCun, Y., Cortes, C.: MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist. Accessed 29 Oct 2018
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. arXiv:1408.5093 (2014)
Nvidia cuDNN. https://developer.nvidia.com/cudnn. Accessed 2 Nov 2018
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems. http://tensorflow.org. Accessed 14 Nov 2018
Paszke, A., et al.: Automatic differentiation in PyTorch. In: Proceedings of the NIPS 2017 Workshop Autodiff, Long Beach, CA (2017)
Nvidia Titan RTX. https://www.nvidia.com/en-us/titan/titan-rtx. Accessed 20 Feb 2019
Chen, T., et al.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In: Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2014), Salt Lake City, UT, pp. 269–284 (2014). https://doi.org/10.1145/2541940.2541967
Tanomoto, M., Takamaeda-Yamazaki, S., Yao, J., Nakashima, Y.: A CGRA-based approach for accelerating convolutional neural networks. In: Proceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC 2015), Turin, pp. 73–80 (2015). https://doi.org/10.1109/MCSoC.2015.41
Shi, R., et al.: A locality aware convolutional neural networks accelerator. In: Proceedings of the 2015 Euromicro Conference on Digital System Design, Funchal, pp. 591–598 (2015). https://doi.org/10.1109/DSD.2015.70
Fan, X., Li, H., Cao, W., Wang, L.: DT-CGRA: dual-track coarse-grained reconfigurable architecture for stream applications. In: Proceedings of the 2016 26th International Conference on Field Programmable Logic and Applications (FPL), Lausanne, pp. 1–9 (2016). https://doi.org/10.1109/FPL.2016.7577309
Jafri, S.M.A.H., Hemani, A., Kolin, P., Abbas, N.: MOCHA: morphable locality and compression aware architecture for convolutional neural networks. In: Proceedings of the 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Orlando, FL, pp. 276–286 (2007). https://doi.org/10.1109/IPDPS.2017.59
Chen, Y.H., Krishna, T., Emer, J.S., Sze, V.: Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J. Solid-State Circuits 52(1), 137–138 (2017). https://doi.org/10.1109/JSSC.2016.2616357
Zhao, B., Wang, M., Liu, M.: An energy-efficient coarse grained spatial architecture for convolutional neural networks AlexNet. IEICE Electron. Express 14(15), 20170595 (2017). https://doi.org/10.1587/elex.14.20170595
Shin, D., Lee, J., Lee, J., Yoo, H. J.: DNPU: an 8.1TOPS/W reconfigurable CNN-RNN processor for general-purpose deep neural networks. In: Proceedings of the in 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 240–241, San Francisco, CA (2017). https://doi.org/10.1109/ISSCC.2017.7870350
Du, L., et al.: A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. IEEE Trans. Circuits Syst. I Regular Papers 65(1), 198–208 (2018). https://doi.org/10.1109/TCSI.2017.2735490
Chakradhar, S., Sankaradas, M., Jakkula, V., Cadambi, S.: A dynamically configurable coprocessor for convolutional neural networks. In: Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA 2010), Saint-Malo, pp. 247–257 (2010). https://doi.org/10.1145/1815961.1815993
Zhang, C., Li, P., Sun, G., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2015), Monterey, CA, pp. 161–170 (2015). https://doi.org/10.1145/2684746.2689060
Qiu, J., et al.: Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2016), Monterey, CA, pp. 26–35 (2016). https://doi.org/10.1145/2847263.2847265
Gokhale, V., Zaidy, A., Chang, A.X.M., Culurciello, E.: Snowflake: an efficient hardware accelerator for convolutional neural networks. In: Proceedings of the 2017 IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, MD, pp. 1–4 (2017). https://doi.org/10.1109/ISCAS.2017.8050809
Hartenstein, R.: A decade of reconfigurable computing: a visionary retrospective. In: Proceedings of the Design, Automation and Test in Europe Conference and Exhibition 2001 (DATE 2001), Munich, pp. 642–649 (2001). https://doi.org/10.1109/DATE.2001.915091
Xilinx: Zynq UltraScale+ Device Technical Reference Manual, UG1085 v1.7 (2017)
Oppold, T., Schweizer, T., Oliveira, J.F., Eisenhardt, S., Kuhn, T., Rosenstiel, W.: CRC - concepts and evaluation of processor-like reconfigurable architectures. Inf. Technol. IT 49(3), 157–164 (2007). https://doi.org/10.1524/itit.2007.49.3.157
Lübeck, K., Morgenstern, D., Schweizer, T., Peterson D., Rosenstiel W., Bringmann O.: Neues Konzept zur Steigerung der Zuverlässigkeit einer ARM-basierten Prozessorarchitektur unter Verwendung eines CGRAs. In: 19. Workshop Methoden und Beschreibungssprachen zur Modellierung und Verifikation von Schaltungen und Systemen (MBMV), Freiburg, pp. 46–58 (2016). https://doi.org/10.6094/UNIFR/10617
Hennessy, J.L., Patterson, D.A.: Computer Architecture, 5th edn. Morgan Kaufmann Publisher Inc., San Francisco (2011)
Dagum, L., Menon, R.: OpenMP: an industry-standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 45–55 (1998). https://doi.org/10.1109/99.660313
Pico-CNN. https://github.com/ekut-es/pico-cnn. Accessed 27 Feb 2019
CRC Configurator. https://github.com/ekut-es/crc_configurator. Accessed 27 Feb 2019
Jia, Y.: Training LeNet on MNIST with Caffe. http://caffe.berkeleyvision.org/gathered/examples/mnist.html. Accessed 20 Feb 2019
System Management Interface Forum, PMBus Power System Management Protocol Specification Part II - Command Language, Revision 1.2 (2010)
Nvidia, Whitepaper NVIDIA Tegra K1 A New Era in Mobile Computing, V1.0 (2013)
Acknowledgments
This work has been partially funded by the Stiftung Industrieforschung through the scholarship for master’s theses and the German Federal Ministry of Education and Research (BMBF) under grant number 16ES0876 (GENIAL!).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lübeck, K., Bringmann, O. (2019). A Heterogeneous and Reconfigurable Embedded Architecture for Energy-Efficient Execution of Convolutional Neural Networks. In: Schoeberl, M., Hochberger, C., Uhrig, S., Brehm, J., Pionteck, T. (eds) Architecture of Computing Systems – ARCS 2019. ARCS 2019. Lecture Notes in Computer Science(), vol 11479. Springer, Cham. https://doi.org/10.1007/978-3-030-18656-2_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-18656-2_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18655-5
Online ISBN: 978-3-030-18656-2
eBook Packages: Computer ScienceComputer Science (R0)