Abstract
Embedded computer vision applications for robotics, security cameras, and mobile phone apps require the usage of mobile neural network architectures like MobileNet-v2 or MNAS-Net in order to reduce RAM consumption and accelerate processing. An additional option for further resource consumption reduction is 8-bit neural network quantization. Unfortunately, the known methods for neural network quantization lead to significant accuracy reduction (more than 1.2%) for mobile architectures and require long training with quantization procedure.
To overcome this limitation, we propose a method that allows to quantize mobile neural network without significant accuracy loss. Our approach is based on trainable quantization thresholds for each neural network filter, that allows to accelerate training with quantization procedure up to 10 times in comparison with the standard techniques.
Using the proposed technique, we quantize the modern mobile architectures of neural networks with the accuracy loss not exceeding 0.1%. Ready-for-use models and code are available at:
https://github.com/agoncharenko1992/FAT-fast-adjustable-threshold.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
https://developer.nvidia.com/tensorrt - NVIDIA TensorRT\(^{\text {TM}}\) platform, 2018.
- 2.
- 3.
https://github.com/tensorflow/tensorflow/blob/master/tensor-flow/lite/g3doc/models.md - Image classification (Quantized Models).
References
Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition CVPR (2018)
Tan, M., Chen, B., Pang, R., Vasudevan, V., Le, Q.V.: MnasNet: platform-aware neural architecture search for mobile. arXiv preprint arXiv:1807.11626 (2018)
Lee, J.H., Ha, S., Choi, S., Lee, W., Lee, S.: Quantization for rapid deployment of deep neural networks. arXiv preprint arXiv:1810.05488 (2018)
Jacob, B., et al.: Quantization and training of neural networks for efficient integer-arithmetic only inference. In: Conference on Computer Vision and Pattern Recognition CVPR (2018)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)
Mishra, A., Marr, D.: Apprentice: Using knowledge distillation techniques to improve low-precision network accuracy. arXiv preprint arXiv:1711.05852 (2017)
Mishra, A., Nurvitadhi, E., Cook, J.J., Marr, D.: WRPN: wide reduced-precision networks. arXiv preprint arXiv:1709.01134 (2017)
Abadi, M., et al.: Tensorflow: Largescale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Courbariaux, M., Bengio, Y., David, J.: Training deep neural networks with low precision multiplications. In: International Conference on Learning Representations ICLR (2015)
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 4107–4115 (2016)
Rastegari, M., Ordonez, V., Redmon, J., Farhadi, A.: XNOR-Net: imagenet classification using binary convolutional neural networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Zhou, S., Wu, Y., Ni, Z., Zhou, X., Wen, H., Zou, Y.: DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv preprint arXiv:1606.06160 (2016)
Bengio, Y., Leonard, N., Courville, A.C.: Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432 (2013)
McDonnell, M.D.: Training wide residual networks for deployment using a single bit for each weight. In: International Conference on Learning Representations ICLR (2018)
Zhu, S., Dong, X., Su, H.: Binary ensemble neural network: More bits per network or more networks per bit? arXiv preprint arXiv:1806.07550 (2018)
Baskin, C., et al.: Nice: Noise injection and clamping estimation for neural network quantization. arXiv preprint arXiv:1810.00162 (2018)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575 (2014)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning ICML (2015)
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations ICLR (2015)
Sheng, T., Feng, C., Zhuo, S., Zhang, X., Shen, L., Aleksic, M.: A quantization-friendly separable convolution for mobilenets. arXiv preprint arXiv:1803.08607 (2018)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Goncharenko, A., Denisov, A., Alyamkin, S., Terentev, E. (2019). Trainable Thresholds for Neural Network Quantization. In: Rojas, I., Joya, G., Catala, A. (eds) Advances in Computational Intelligence. IWANN 2019. Lecture Notes in Computer Science(), vol 11507. Springer, Cham. https://doi.org/10.1007/978-3-030-20518-8_26
Download citation
DOI: https://doi.org/10.1007/978-3-030-20518-8_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20517-1
Online ISBN: 978-3-030-20518-8
eBook Packages: Computer ScienceComputer Science (R0)