Abstract
The traditional CNN algorithm requires a great deal of computation and is difficult to be optimized. The computation of throughput on the hardware platform does not match the memory bandwidth very well. The existing scheme doesn’t take full advantage of logical resources, and also doesn’t make full use of memory bandwidth. Neither of them can get the best performance. In this paper, we use the commonly used im2col method in the software implementation and convert convolution operation into matrix multiplication. Therefore, it improves the calculation speed effectively. In the hardware implementation aspect, we propose a nested loop optimization structure. Firstly, the correlation of the parameters is analyzed, multiplication times are reduced and the multiplication of the inner loop is replaced by an addition operation. Hence, the maximum operating frequency and power consumption are improved remarkably. Secondly, the input data and the convolution kernel are multi-level partitioning optimization. The multi-layer input data is grouped by 2k, and the data of each layer is optimized by L group. At the same time, the convolution kernel is also grouped by 2k and the convolution kernel with the parallel data synchronization operation optimization. So the structure has a significant improvement in the degree of parallelism. The external bandwidth and the internal bandwidth can be improved significantly in the condition of the same total computation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Ren, S., He, K., Girshik, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2015)
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 161–170. ACM (2015)
Farabet, C., Poulet, C., Han, J.Y., LeCun, Y.: CNP: An FPGA-based processor for convolutional networks. In: International Conference on Field Programmable Logic and Applications, pp. 32–37. IEEE (2009)
Peemen, M., Setio, A.A., Mesman, B., Corporaal, H.: Memory-centric accelerator design for convolutional neural networks. In: IEEE International Conference on Computer Design, pp. 13–19. IEEE (2013)
Sankaradas, M., Jakkula, V., Cadambi, S., Chakradhar, S., Durdanovic, I., Cosatto, E., Graf, H.P.: A massively parallel coprocessor for convolutional neural networks. In: IEEE International Conference on Application-Specific Systems, Architectures and Processors, pp. 53–60. IEEE (2009)
Chakradhar, S., Sankaradas, M., Jakkula, V., Cadambi, S.: A dynamically configurable coprocessor for convolutional neural networks. ACM SIGARCH Comput. Archit. News 38(3), 247–257 (2010)
Cadambi, S., Majumdar, A., Becchi, M., Chakradhar, S., Graf, H.P.: A programmable parallel accelerator for learning and classification. In: International Conference on Parallel Architectures and Compilation Techniques, pp. 273–284. IEEE (2017)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tang, F., Zhang, W., Tian, X., Fan, X., Cao, X. (2018). Optimization of Convolution Neural Network Algorithm Based on FPGA. In: Bi, Y., Chen, G., Deng, Q., Wang, Y. (eds) Embedded Systems Technology. ESTC 2017. Communications in Computer and Information Science, vol 857. Springer, Singapore. https://doi.org/10.1007/978-981-13-1026-3_10
Download citation
DOI: https://doi.org/10.1007/978-981-13-1026-3_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1025-6
Online ISBN: 978-981-13-1026-3
eBook Packages: Computer ScienceComputer Science (R0)