Optimization of Convolution Neural Network Algorithm Based on FPGA

Tang, Feixue; Zhang, Weichao; Tian, Xiaogang; Fan, Xiaoye; Cao, Xixin

doi:10.1007/978-981-13-1026-3_10

Feixue Tang¹²,
Weichao Zhang¹²,
Xiaogang Tian¹²,
Xiaoye Fan¹² &
…
Xixin Cao¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 857))

Included in the following conference series:

National Conference on Embedded System Technology

777 Accesses
1 Citations

Abstract

The traditional CNN algorithm requires a great deal of computation and is difficult to be optimized. The computation of throughput on the hardware platform does not match the memory bandwidth very well. The existing scheme doesn’t take full advantage of logical resources, and also doesn’t make full use of memory bandwidth. Neither of them can get the best performance. In this paper, we use the commonly used im2col method in the software implementation and convert convolution operation into matrix multiplication. Therefore, it improves the calculation speed effectively. In the hardware implementation aspect, we propose a nested loop optimization structure. Firstly, the correlation of the parameters is analyzed, multiplication times are reduced and the multiplication of the inner loop is replaced by an addition operation. Hence, the maximum operating frequency and power consumption are improved remarkably. Secondly, the input data and the convolution kernel are multi-level partitioning optimization. The multi-layer input data is grouped by 2k, and the data of each layer is optimized by L group. At the same time, the convolution kernel is also grouped by 2k and the convolution kernel with the parallel data synchronization operation optimization. So the structure has a significant improvement in the degree of parallelism. The external bandwidth and the internal bandwidth can be improved significantly in the condition of the same total computation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Ren, S., He, K., Girshik, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2015)
Article Google Scholar
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170. ACM (2015)
Google Scholar
Farabet, C., Poulet, C., Han, J.Y., LeCun, Y.: CNP: An FPGA-based processor for convolutional networks. In: International Conference on Field Programmable Logic and Applications, pp. 32–37. IEEE (2009)
Google Scholar
Peemen, M., Setio, A.A., Mesman, B., Corporaal, H.: Memory-centric accelerator design for convolutional neural networks. In: IEEE International Conference on Computer Design, pp. 13–19. IEEE (2013)
Google Scholar
Sankaradas, M., Jakkula, V., Cadambi, S., Chakradhar, S., Durdanovic, I., Cosatto, E., Graf, H.P.: A massively parallel coprocessor for convolutional neural networks. In: IEEE International Conference on Application-Specific Systems, Architectures and Processors, pp. 53–60. IEEE (2009)
Google Scholar
Chakradhar, S., Sankaradas, M., Jakkula, V., Cadambi, S.: A dynamically configurable coprocessor for convolutional neural networks. ACM SIGARCH Comput. Archit. News 38(3), 247–257 (2010)
Article Google Scholar
Cadambi, S., Majumdar, A., Becchi, M., Chakradhar, S., Graf, H.P.: A programmable parallel accelerator for learning and classification. In: International Conference on Parallel Architectures and Compilation Techniques, pp. 273–284. IEEE (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Peking University, Beijing, 10087, China
Feixue Tang, Weichao Zhang, Xiaogang Tian, Xiaoye Fan & Xixin Cao

Authors

Feixue Tang
View author publications
You can also search for this author in PubMed Google Scholar
Weichao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaogang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoye Fan
View author publications
You can also search for this author in PubMed Google Scholar
Xixin Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Weichao Zhang or Xiaogang Tian .

Editor information

Editors and Affiliations

Northeastern University, Shenyang, China
Yuanguo Bi
Northeastern University, Shenyang, China
Gang Chen
Northeastern University, Shenyang, China
Qingxu Deng
Northeastern University, Shenyang, China
Yi Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, F., Zhang, W., Tian, X., Fan, X., Cao, X. (2018). Optimization of Convolution Neural Network Algorithm Based on FPGA. In: Bi, Y., Chen, G., Deng, Q., Wang, Y. (eds) Embedded Systems Technology. ESTC 2017. Communications in Computer and Information Science, vol 857. Springer, Singapore. https://doi.org/10.1007/978-981-13-1026-3_10

Download citation

DOI: https://doi.org/10.1007/978-981-13-1026-3_10
Published: 10 July 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1025-6
Online ISBN: 978-981-13-1026-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)