Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks

Faraone, Julian; Fraser, Nicholas; Gambardella, Giulio; Blott, Michaela; Leong, Philip H. W.

doi:10.1007/978-3-319-70096-0_41

Julian Faraone^18,19,
Nicholas Fraser^18,19,
Giulio Gambardella¹⁹,
Michaela Blott¹⁹ &
…
Philip H. W. Leong¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10635))

Included in the following conference series:

International Conference on Neural Information Processing

7972 Accesses
5 Citations

Abstract

A low precision deep neural network training technique for producing sparse, ternary neural networks is presented. The technique incorporates hardware implementation costs during training to achieve significant model compression for inference. Training involves three stages: network training using L2 regularization and a quantization threshold regularizer, quantization pruning, and finally retraining. Resulting networks achieve improved accuracy, reduced memory footprint and reduced computational complexity compared with conventional methods, on MNIST and CIFAR10 datasets. Our networks are up to 98% sparse and 5 & 11 times smaller than equivalent binary and ternary models, translating to significant resource and speed benefits for hardware implementations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
FINN quotes 2.5 LUTs per operation, which is multiplied by 2 to get LUTs/per MAC.
2.
Assuming 70% of the LUTs and 100% of the DSPs can be utilised for compute.

References

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)
Google Scholar
Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation (2015)
Google Scholar
Courbariaux, M., Bengio, Y., David, J.-P.: Binaryconnect: training deep neural networks with binary weights during propagations (2015)
Google Scholar
Courbariaux, M., Bengio, Y.: Binarynet: training deep neural networks with weights and activations constrained to +1 or \(-\)1 (2016)
Google Scholar
Li, F., Liu, B.: Ternary weight networks (2016)
Google Scholar
Venkatesh, G., Nurvitadhi, E., Marr, D.: Accelerating deep convolutional networks using low-precision and sparsity (2016)
Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: International Conference on Learning Representations (2015)
Google Scholar
Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems (NIPS), pp. 1135–1143 (2015)
Google Scholar
Ardakani, A., Condo, C., Gross, W.J.: Sparsely-connected neural networks: towards efficient VLSI implementation of deep neural networks (2016)
Google Scholar
Umuroglu, Y., Fraser, N.J., Gambardella, G., Blott, M., Leong, P.H.W, Jahre, M., Vissers, K.A.: FINN: a framework for fast, scalable binarized neural network inference (2016)
Google Scholar
Fraser, N.J., Umuroglu, Y., Gambardella, G., Blott, M., Leong, P.H.W., Jahre, M., Vissers, K.A.: Scaling binarized neural networks on reconfigurable logic (2017)
Google Scholar

Download references

Acknowledgements

This research was partly supported under the Australian Research Councils Linkage Projects funding scheme (project number LP130101034) and Zomojo Pty Ltd.

Author information

Authors and Affiliations

School of Electrical and Information Engineering, The University of Sydney, Sydney, Australia
Julian Faraone, Nicholas Fraser & Philip H. W. Leong
Xilinx Research Labs, Dublin, Ireland
Julian Faraone, Nicholas Fraser, Giulio Gambardella & Michaela Blott

Authors

Julian Faraone
View author publications
You can also search for this author in PubMed Google Scholar
Nicholas Fraser
View author publications
You can also search for this author in PubMed Google Scholar
Giulio Gambardella
View author publications
You can also search for this author in PubMed Google Scholar
Michaela Blott
View author publications
You can also search for this author in PubMed Google Scholar
Philip H. W. Leong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julian Faraone .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Faraone, J., Fraser, N., Gambardella, G., Blott, M., Leong, P.H.W. (2017). Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10635. Springer, Cham. https://doi.org/10.1007/978-3-319-70096-0_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-70096-0_41
Published: 26 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70095-3
Online ISBN: 978-3-319-70096-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics