Skip to main content

Trainable Thresholds for Neural Network Quantization

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11507))

Abstract

Embedded computer vision applications for robotics, security cameras, and mobile phone apps require the usage of mobile neural network architectures like MobileNet-v2 or MNAS-Net in order to reduce RAM consumption and accelerate processing. An additional option for further resource consumption reduction is 8-bit neural network quantization. Unfortunately, the known methods for neural network quantization lead to significant accuracy reduction (more than 1.2%) for mobile architectures and require long training with quantization procedure.

To overcome this limitation, we propose a method that allows to quantize mobile neural network without significant accuracy loss. Our approach is based on trainable quantization thresholds for each neural network filter, that allows to accelerate training with quantization procedure up to 10 times in comparison with the standard techniques.

Using the proposed technique, we quantize the modern mobile architectures of neural networks with the accuracy loss not exceeding 0.1%. Ready-for-use models and code are available at:

https://github.com/agoncharenko1992/FAT-fast-adjustable-threshold.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://developer.nvidia.com/tensorrt - NVIDIA TensorRT\(^{\text {TM}}\) platform, 2018.

  2. 2.

    https://github.com/NervanaSystems/distiller.

  3. 3.

    https://github.com/tensorflow/tensorflow/blob/master/tensor-flow/lite/g3doc/models.md - Image classification (Quantized Models).

References

  1. Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  2. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2018)

    Google Scholar 

  3. Tan, M., Chen, B., Pang, R., Vasudevan, V., Le, Q.V.: MnasNet: platform-aware neural architecture search for mobile. arXiv preprint arXiv:1807.11626 (2018)

  4. Lee, J.H., Ha, S., Choi, S., Lee, W., Lee, S.: Quantization for rapid deployment of deep neural networks. arXiv preprint arXiv:1810.05488 (2018)

  5. Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic only inference. In: Conference on Computer Vision and Pattern Recognition CVPR (2018)

    Google Scholar 

  6. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  7. Mishra, A., Marr, D.: Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. arXiv preprint arXiv:1711.05852 (2017)

  8. Mishra, A., Nurvitadhi, E., Cook, J.J., Marr, D.: WRPN: wide reduced-precision networks. arXiv preprint arXiv:1709.01134 (2017)

  9. Abadi, M., et al.: Tensorflow: Largescale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)

  10. Courbariaux, M., Bengio, Y., David, J.: Training deep neural networks with low precision multiplications. In: International Conference on Learning Representations ICLR (2015)

    Google Scholar 

  11. Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 4107–4115 (2016)

    Google Scholar 

  12. Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32

    Chapter  Google Scholar 

  13. Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)

  14. Bengio, Y., Leonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)

  15. McDonnell, M.D.: Training wide residual networks for deployment using a single bit for each weight. In: International Conference on Learning Representations ICLR (2018)

    Google Scholar 

  16. Zhu, S., Dong, X., Su, H.: Binary ensemble neural network: More bits per network or more networks per bit? arXiv preprint arXiv:1806.07550 (2018)

  17. Baskin, C., et al.: Nice: Noise injection and clamping estimation for neural network quantization. arXiv preprint arXiv:1810.00162 (2018)

  18. Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575 (2014)

  19. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning ICML (2015)

    Google Scholar 

  20. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations ICLR (2015)

    Google Scholar 

  21. Sheng, T., Feng, C., Zhuo, S., Zhang, X., Shen, L., Aleksic, M.: A quantization-friendly separable convolution for mobilenets. arXiv preprint arXiv:1803.08607 (2018)

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Alexander Goncharenko or Sergey Alyamkin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Goncharenko, A., Denisov, A., Alyamkin, S., Terentev, E. (2019). Trainable Thresholds for Neural Network Quantization. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2019. Lecture Notes in Computer Science(), vol 11507. Springer, Cham. https://doi.org/10.1007/978-3-030-20518-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20518-8_26

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20517-1

  • Online ISBN: 978-3-030-20518-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics