Trainable Thresholds for Neural Network Quantization

Goncharenko, Alexander; Denisov, Andrey; Alyamkin, Sergey; Terentev, Evgeny

doi:10.1007/978-3-030-20518-8_26

Trainable Thresholds for Neural Network Quantization

Alexander Goncharenko^17,18,
Andrey Denisov^17,18,
Sergey Alyamkin¹⁷ &
…
Evgeny Terentev¹⁹

Conference paper
First Online: 16 May 2019

2395 Accesses
1 Citations
2 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11507))

Abstract

Embedded computer vision applications for robotics, security cameras, and mobile phone apps require the usage of mobile neural network architectures like MobileNet-v2 or MNAS-Net in order to reduce RAM consumption and accelerate processing. An additional option for further resource consumption reduction is 8-bit neural network quantization. Unfortunately, the known methods for neural network quantization lead to significant accuracy reduction (more than 1.2%) for mobile architectures and require long training with quantization procedure.

To overcome this limitation, we propose a method that allows to quantize mobile neural network without significant accuracy loss. Our approach is based on trainable quantization thresholds for each neural network filter, that allows to accelerate training with quantization procedure up to 10 times in comparison with the standard techniques.

Using the proposed technique, we quantize the modern mobile architectures of neural networks with the accuracy loss not exceeding 0.1%. Ready-for-use models and code are available at:

https://github.com/agoncharenko1992/FAT-fast-adjustable-threshold.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://developer.nvidia.com/tensorrt - NVIDIA TensorRT\(^{\text {TM}}\) platform, 2018.
2.
https://github.com/NervanaSystems/distiller.
3.
https://github.com/tensorflow/tensorflow/blob/master/tensor-flow/lite/g3doc/models.md - Image classification (Quantized Models).

References

Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2018)
Google Scholar
Tan, M., Chen, B., Pang, R., Vasudevan, V., Le, Q.V.: MnasNet: platform-aware neural architecture search for mobile. arXiv preprint arXiv:1807.11626 (2018)
Lee, J.H., Ha, S., Choi, S., Lee, W., Lee, S.: Quantization for rapid deployment of deep neural networks. arXiv preprint arXiv:1810.05488 (2018)
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic only inference. In: Conference on Computer Vision and Pattern Recognition CVPR (2018)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Mishra, A., Marr, D.: Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. arXiv preprint arXiv:1711.05852 (2017)
Mishra, A., Nurvitadhi, E., Cook, J.J., Marr, D.: WRPN: wide reduced-precision networks. arXiv preprint arXiv:1709.01134 (2017)
Abadi, M., et al.: Tensorflow: Largescale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Courbariaux, M., Bengio, Y., David, J.: Training deep neural networks with low precision multiplications. In: International Conference on Learning Representations ICLR (2015)
Google Scholar
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 4107–4115 (2016)
Google Scholar
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Chapter Google Scholar
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)
Bengio, Y., Leonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
McDonnell, M.D.: Training wide residual networks for deployment using a single bit for each weight. In: International Conference on Learning Representations ICLR (2018)
Google Scholar
Zhu, S., Dong, X., Su, H.: Binary ensemble neural network: More bits per network or more networks per bit? arXiv preprint arXiv:1806.07550 (2018)
Baskin, C., et al.: Nice: Noise injection and clamping estimation for neural network quantization. arXiv preprint arXiv:1810.00162 (2018)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575 (2014)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning ICML (2015)
Google Scholar
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations ICLR (2015)
Google Scholar
Sheng, T., Feng, C., Zhuo, S., Zhang, X., Shen, L., Aleksic, M.: A quantization-friendly separable convolution for mobilenets. arXiv preprint arXiv:1803.08607 (2018)

Download references

Author information

Authors and Affiliations

Expasoft LLC, Novosibirsk, 630090, Russia
Alexander Goncharenko, Andrey Denisov & Sergey Alyamkin
Novosibirsk State University, Novosibirsk, 630090, Russia
Alexander Goncharenko & Andrey Denisov
Microtech, Moscow, Russia
Evgeny Terentev

Authors

Alexander Goncharenko
View author publications
You can also search for this author in PubMed Google Scholar
Andrey Denisov
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Alyamkin
View author publications
You can also search for this author in PubMed Google Scholar
Evgeny Terentev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Alexander Goncharenko or Sergey Alyamkin .

Editor information

Editors and Affiliations

University of Granada, Granada, Spain
Ignacio Rojas
University of Malaga, Malaga, Spain
Gonzalo Joya
Polytechnic University of Catalonia, Barcelona, Spain
Andreu Catala

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Goncharenko, A., Denisov, A., Alyamkin, S., Terentev, E. (2019). Trainable Thresholds for Neural Network Quantization. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2019. Lecture Notes in Computer Science(), vol 11507. Springer, Cham. https://doi.org/10.1007/978-3-030-20518-8_26

Download citation

DOI: https://doi.org/10.1007/978-3-030-20518-8_26
Published: 16 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20517-1
Online ISBN: 978-3-030-20518-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics