Accelerating Block-Circulant Matrix-Based Neural Network Layer on a General Purpose Computing Platform: A Design Guideline

Pugdeethosapol, Krittaphat; Jin, Zhao; Rider, Daniel; Qiu, Qinru

doi:10.1007/978-3-030-39442-4_32

Accelerating Block-Circulant Matrix-Based Neural Network Layer on a General Purpose Computing Platform: A Design Guideline

Krittaphat Pugdeethosapol¹⁷,
Zhao Jin¹⁷,
Daniel Rider¹⁷ &
…
Qinru Qiu¹⁷

Conference paper
First Online: 13 February 2020

1323 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1130))

Abstract

Deep neural networks (DNNs) have become a powerful tool and enabled the state-of-the art accuracy on many challenging tasks. However, large-scale DNNs highly consume both computational time and storage space. To optimize and improve the performance of the network while maintaining the accuracy, the block-circulant matrix-based (BCM) algorithm has been introduced. BCM utilizes the Fast Fourier Transform (FFT) with block-circulant matrices to compute the output of each layer of the network. Unlike conventional pruning techniques, the network structure is maintained while using the BCM. Compared to conventional matrix implementation, the BCM reduces the computational complexity of a neural network layer from O(n^2) to O(n^2/k), and it has been proven to be highly effective when implemented using customized hardware, such as FPGAs. However, its performance suffers from overhead of FFT and matrix reshaping on general purpose computing platforms. In certain cases, using the BCM does not improve the total computation time of the networks at all. In this paper, we propose a parallel implementation of the BCM layer and guidelines that generally lead to better implementation practice is provided. The guidelines run across popular implementation language and packages including Python, numpy, intel-numpy, tensorflow, and nGraph.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, pp. 160–167. ACM (2008)
Google Scholar
Huval, B., Wang, T., Tandon, S., Kiske, J., Song, W., Pazhayampallil, J., Andriluka, M., Rajpurkar, P., Migimatsu, T., Cheng-Yue, R., et al.: An empirical evaluation of deep learning on highway driving. arXiv preprint arXiv:1504.01716 (2015)
Burbidge, R., Trotter, M., Buxton, B., Holden, S.: Drug design by machine learning: support vector machines for pharmaceutical data analysis. Comput. Chem. 26(1), 5–14 (2001)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Ding, C., Liao, S., Wang, Y., Li, Z., Liu, N., Zhuo, Y., Wang, C., Qian, X., Bai, Y., Yuan, G., Ma, X.: CirCNN: accelerating and compressing deep neural networks using block-circulant weight matrices. In: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 395–408. ACM, October 2017
Google Scholar
Cheng, Y., Yu, F.X., Feris, R.S., Kumar, S., Choudhary, A., Chang, S.F.: An exploration of parameter redundancy in deep networks with circulant projections. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2857–2865 (2015)
Google Scholar
Saxe, A.M., Koh, P.W., Chen, Z., Bhand, M., Suresh, B., Ng, A.Y.: On random weights and unsupervised feature learning. In: ICML, vol. 2, no. 3, p. 6, June 2011
Google Scholar
Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient transfer learning. arXiv preprint arXiv:1611.06440, 3 (2016)
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems, pp. 1135–1143 (2015)
Google Scholar
Luo, J.H., Wu, J., Lin, W.: ThiNet: a filter level pruning method for deep neural network compression. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5058–5066 (2017)
Google Scholar
Cai, Z., He, X., Sun, J., Vasconcelos, N.: Deep learning with low precision by half-wave Gaussian quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5918–5926 (2017)
Google Scholar
Gong, Y., Liu, L., Yang, M., Bourdev, L.: Compressing deep convolutional networks using vector quantization. arXiv preprint arXiv:1412.6115 (2014)
Zhu, M., Gupta, S.: To prune, or not to prune: exploring the efficacy of pruning for model compression. arXiv preprint arXiv:1710.01878 (2017)
Mathieu, M., Henaff, M., LeCun, Y.: Fast training of convolutional networks through FFTs. arXiv preprint arXiv:1312.5851 (2013)
Pan, V.Y.: Structured Matrices and Polynomials: Unified Superfast Algorithms. Springer, Boston (2012)
MATH Google Scholar
Moritz, P., Nishihara, R., Wang, S., Tumanov, A., Liaw, R., Liang, E., Elibol, M., Yang, Z., Paul, W., Jordan, M.I., Stoica, I.: Ray: a distributed framework for emerging AI applications. In: 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2018), pp. 561–577 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Syracuse University, Syracuse, NY, 13244, USA
Krittaphat Pugdeethosapol, Zhao Jin, Daniel Rider & Qinru Qiu

Authors

Krittaphat Pugdeethosapol
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Jin
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Rider
View author publications
You can also search for this author in PubMed Google Scholar
Qinru Qiu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Krittaphat Pugdeethosapol .

Editor information

Editors and Affiliations

Faculty of Science and Engineering, Saga University, Saga, Japan
Kohei Arai
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Supriya Kapoor
The Science and Information (SAI) Organization, Bradford, West Yorkshire, UK
Rahul Bhatia

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pugdeethosapol, K., Jin, Z., Rider, D., Qiu, Q. (2020). Accelerating Block-Circulant Matrix-Based Neural Network Layer on a General Purpose Computing Platform: A Design Guideline. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Advances in Information and Communication. FICC 2020. Advances in Intelligent Systems and Computing, vol 1130. Springer, Cham. https://doi.org/10.1007/978-3-030-39442-4_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-39442-4_32
Published: 13 February 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39441-7
Online ISBN: 978-3-030-39442-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics