Skip to main content

Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10635))

Included in the following conference series:

Abstract

A low precision deep neural network training technique for producing sparse, ternary neural networks is presented. The technique incorporates hardware implementation costs during training to achieve significant model compression for inference. Training involves three stages: network training using L2 regularization and a quantization threshold regularizer, quantization pruning, and finally retraining. Resulting networks achieve improved accuracy, reduced memory footprint and reduced computational complexity compared with conventional methods, on MNIST and CIFAR10 datasets. Our networks are up to 98% sparse and 5 & 11 times smaller than equivalent binary and ternary models, translating to significant resource and speed benefits for hardware implementations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    FINN quotes 2.5 LUTs per operation, which is multiplied by 2 to get LUTs/per MAC.

  2. 2.

    Assuming 70% of the LUTs and 100% of the DSPs can be utilised for compute.

References

  1. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2014)

    Google Scholar 

  2. Luong, M.-T., Pham, H., Manning, C.D.: Effective approaches to attention-based neural machine translation (2015)

    Google Scholar 

  3. Courbariaux, M., Bengio, Y., David, J.-P.: Binaryconnect: training deep neural networks with binary weights during propagations (2015)

    Google Scholar 

  4. Courbariaux, M., Bengio, Y.: Binarynet: training deep neural networks with weights and activations constrained to +1 or \(-\)1 (2016)

    Google Scholar 

  5. Li, F., Liu, B.: Ternary weight networks (2016)

    Google Scholar 

  6. Venkatesh, G., Nurvitadhi, E., Marr, D.: Accelerating deep convolutional networks using low-precision and sparsity (2016)

    Google Scholar 

  7. Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: International Conference on Learning Representations (2015)

    Google Scholar 

  8. Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: Advances in Neural Information Processing Systems (NIPS), pp. 1135–1143 (2015)

    Google Scholar 

  9. Ardakani, A., Condo, C., Gross, W.J.: Sparsely-connected neural networks: towards efficient VLSI implementation of deep neural networks (2016)

    Google Scholar 

  10. Umuroglu, Y., Fraser, N.J., Gambardella, G., Blott, M., Leong, P.H.W, Jahre, M., Vissers, K.A.: FINN: a framework for fast, scalable binarized neural network inference (2016)

    Google Scholar 

  11. Fraser, N.J., Umuroglu, Y., Gambardella, G., Blott, M., Leong, P.H.W., Jahre, M., Vissers, K.A.: Scaling binarized neural networks on reconfigurable logic (2017)

    Google Scholar 

Download references

Acknowledgements

This research was partly supported under the Australian Research Councils Linkage Projects funding scheme (project number LP130101034) and Zomojo Pty Ltd.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julian Faraone .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Faraone, J., Fraser, N., Gambardella, G., Blott, M., Leong, P.H.W. (2017). Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10635. Springer, Cham. https://doi.org/10.1007/978-3-319-70096-0_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-70096-0_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-70095-3

  • Online ISBN: 978-3-319-70096-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics