Abstract
In recent years, Artificial Neural Networks have evolved rapidly and are applied to various fields. Meanwhile, to enhance computation efficiency of neural network applications, more and more neural network accelerators have been developed. Though traditional task scheduling algorithms on heterogeneous systems have been intensively researched, they can’t be applied to neural network accelerators directly. Based on typical characteristics of neural network accelerators, we formalize the problem of tasks scheduling for neural networks, and transplant two listing heuristic scheduling algorithms, Heterogeneous-Earliest-Finish-Time (HEFT) and Critical-Path-on-a-Processor (CPOP). Inspired by the separable features of neural network operations, we propose two partition algorithms, the Iterative Partition Scheduling Algorithm (IPS) and the Partition Scheduling Combination Algorithm (PSC), which can be associated with scheduling algorithms. Further, we conduct experiments on some typical neural networks, and results show that compared to scheduling-only algorithms the partition associated algorithms achieve about 2x to 3x speedup.
This work is partially supported by the National Key Research and Development Program of China (under Grant 2017YFB1003101), the NSF of China (under Grants 61432016, 61532016, 61672491, 61602441, 61602446, 61732002, 61702478, 61732007 and 61732020), Beijing Natural Science Foundation (JQ18013), the 973 Program of China (under Grant 2015CB358800), National Science and Technology Major Project (2018ZX01031102), the Transformation and Transfer of Scientific and Technological Achievements of Chinese Academy of Sciences (KFJ-HGZX-013), Key Research Projects in Frontier Science of Chinese Academy of Sciences (QYZDB-SSW-JSC001), Strategic Priority Research Program of Chinese Academy of Science (XDB32050200, XDC01020000) and Standardization Research Project of Chinese Academy of Sciences (BZ201800001).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amodei, D., et al.: Deep speech 2: end-to-end speech recognition in English and mandarin. In: International Conference on Machine Learning, pp. 173–182 (2016)
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: Deepface: closing the gap to human-level performance in face verification, pp. 1701–1708 (2014)
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv: Computer Vision and Pattern Recognition (2016)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems, vol. 141, no. 5, pp. 1097–1105 (2012)
Gschwind, M.K., Salapura, V., Maischberger, O.: Space efficient neural net implementation (1994)
Ovtcharov, K., Ruwase, O., Kim, J.Y., Fowers, J., Strauss, K., Chung, E.S.: Accelerating deep convolutional neural networks using specialized hardware. Miscellaneous (2015)
Mittal, S.: A survey of FPGA-based accelerators for convolutional neural networks. Neural Comput. Appl. 1–31 (2018)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: ImageNet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Sebastian, A., et al.: Temporal correlation detection using computational phase-change memory. Nat. Commun. 8(1), 1115 (2017)
Rios, C.E.C., et al.: In-memory computing on a photonic platform. Sci. Adv. 5(2), eaau5759 (2019)
Jouppi, N.P., Borchers, A., Boyle, R., Cantin, P.L., Nan, B.: In-datacenter performance analysis of a tensor processing unit (2017)
Ullman, J.D.: NP-complete scheduling problems. J. Comput. Syst. Sci. 10(3), 384–393 (1975)
Topcuoglu, H.R., Hariri, S., Wu, M.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)
Mittal, S.: A survey on optimized implementation of deep learning models on the NVIDIA Jetson platform. J. Syst. Archit. 97, 428–442 (2019)
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B., Cong, J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks, pp. 161–170 (2015)
Chen, T., et al.: DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Not. 49(4), 269–284 (2014)
Aimar, A., et al.: NullHop: a flexible convolutional neural network accelerator based on sparse representations of feature maps. IEEE Trans. Neural Netw. 30(3), 644–656 (2019)
Elrewini, H., Lewis, T.G.: Scheduling parallel program tasks onto arbitrary target machines. J. Parallel Distrib. Comput. 9(2), 138–153 (1990)
Hwang, J., Chow, Y., Anger, F., Lee, C.: Scheduling precedence graphs in systems with interprocessor communication times. SIAM J. Comput. 18(2), 244–257 (1989)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Chen, X. et al. (2019). Partition and Scheduling Algorithms for Neural Network Accelerators. In: Yew, PC., Stenström, P., Wu, J., Gong, X., Li, T. (eds) Advanced Parallel Processing Technologies. APPT 2019. Lecture Notes in Computer Science(), vol 11719. Springer, Cham. https://doi.org/10.1007/978-3-030-29611-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-29611-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29610-0
Online ISBN: 978-3-030-29611-7
eBook Packages: Computer ScienceComputer Science (R0)