Abstract
A low precision deep neural network training technique for producing sparse, ternary neural networks is presented. The technique incorporates hardware implementation costs during training to achieve significant model compression for inference. Training involves three stages: network training using L2 regularization and a quantization threshold regularizer, quantization pruning, and finally retraining. Resulting networks achieve improved accuracy, reduced memory footprint and reduced computational complexity compared with conventional methods, on MNIST and CIFAR10 datasets. Our networks are up to 98% sparse and 5 & 11 times smaller than equivalent binary and ternary models, translating to significant resource and speed benefits for hardware implementations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
FINN quotes 2.5 LUTs per operation, which is multiplied by 2 to get LUTs/per MAC.
- 2.
Assuming 70% of the LUTs and 100% of the DSPs can be utilised for compute.
References
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)
Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation (2015)
Courbariaux, M., Bengio, Y., David, J.-P.: Binaryconnect: training deep neural networks with binary weights during propagations (2015)
Courbariaux, M., Bengio, Y.: Binarynet: training deep neural networks with weights and activations constrained to +1 or \(-\)1 (2016)
Li, F., Liu, B.: Ternary weight networks (2016)
Venkatesh, G., Nurvitadhi, E., Marr, D.: Accelerating deep convolutional networks using low-precision and sparsity (2016)
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: International Conference on Learning Representations (2015)
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems (NIPS), pp. 1135–1143 (2015)
Ardakani, A., Condo, C., Gross, W.J.: Sparsely-connected neural networks: towards efficient VLSI implementation of deep neural networks (2016)
Umuroglu, Y., Fraser, N.J., Gambardella, G., Blott, M., Leong, P.H.W, Jahre, M., Vissers, K.A.: FINN: a framework for fast, scalable binarized neural network inference (2016)
Fraser, N.J., Umuroglu, Y., Gambardella, G., Blott, M., Leong, P.H.W., Jahre, M., Vissers, K.A.: Scaling binarized neural networks on reconfigurable logic (2017)
Acknowledgements
This research was partly supported under the Australian Research Councils Linkage Projects funding scheme (project number LP130101034) and Zomojo Pty Ltd.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Faraone, J., Fraser, N., Gambardella, G., Blott, M., Leong, P.H.W. (2017). Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10635. Springer, Cham. https://doi.org/10.1007/978-3-319-70096-0_41
Download citation
DOI: https://doi.org/10.1007/978-3-319-70096-0_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70095-3
Online ISBN: 978-3-319-70096-0
eBook Packages: Computer ScienceComputer Science (R0)