Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks
A low precision deep neural network training technique for producing sparse, ternary neural networks is presented. The technique incorporates hardware implementation costs during training to achieve significant model compression for inference. Training involves three stages: network training using L2 regularization and a quantization threshold regularizer, quantization pruning, and finally retraining. Resulting networks achieve improved accuracy, reduced memory footprint and reduced computational complexity compared with conventional methods, on MNIST and CIFAR10 datasets. Our networks are up to 98% sparse and 5 & 11 times smaller than equivalent binary and ternary models, translating to significant resource and speed benefits for hardware implementations.
KeywordsDeep Neural Networks Ternary Neural Network Low-precision Pruning Sparsity Compression
This research was partly supported under the Australian Research Councils Linkage Projects funding scheme (project number LP130101034) and Zomojo Pty Ltd.
- 1.Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)Google Scholar
- 2.Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation (2015)Google Scholar
- 3.Courbariaux, M., Bengio, Y., David, J.-P.: Binaryconnect: training deep neural networks with binary weights during propagations (2015)Google Scholar
- 4.Courbariaux, M., Bengio, Y.: Binarynet: training deep neural networks with weights and activations constrained to +1 or \(-\)1 (2016)Google Scholar
- 5.Li, F., Liu, B.: Ternary weight networks (2016)Google Scholar
- 6.Venkatesh, G., Nurvitadhi, E., Marr, D.: Accelerating deep convolutional networks using low-precision and sparsity (2016)Google Scholar
- 7.Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: International Conference on Learning Representations (2015)Google Scholar
- 8.Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems (NIPS), pp. 1135–1143 (2015)Google Scholar
- 9.Ardakani, A., Condo, C., Gross, W.J.: Sparsely-connected neural networks: towards efficient VLSI implementation of deep neural networks (2016)Google Scholar
- 10.Umuroglu, Y., Fraser, N.J., Gambardella, G., Blott, M., Leong, P.H.W, Jahre, M., Vissers, K.A.: FINN: a framework for fast, scalable binarized neural network inference (2016)Google Scholar
- 11.Fraser, N.J., Umuroglu, Y., Gambardella, G., Blott, M., Leong, P.H.W., Jahre, M., Vissers, K.A.: Scaling binarized neural networks on reconfigurable logic (2017)Google Scholar