Space Efficient Quantization for Deep Convolutional Neural Networks

Zhao, Dong-Di; Li, Fan; Sharif, Kashif; Xia, Guang-Min; Wang, Yu

doi:10.1007/s11390-019-1912-1

Space Efficient Quantization for Deep Convolutional Neural Networks

Regular Paper
Published: 22 March 2019

Volume 34, pages 305–317, (2019)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Dong-Di Zhao¹,
Fan Li¹,
Kashif Sharif¹,
Guang-Min Xia¹ &
…
Yu Wang²

143 Accesses
5 Citations
Explore all metrics

Abstract

Deep convolutional neural networks (DCNNs) have shown outstanding performance in the fields of computer vision, natural language processing, and complex system analysis. With the improvement of performance with deeper layers, DCNNs incur higher computational complexity and larger storage requirement, making it extremely difficult to deploy DCNNs on resource-limited embedded systems (such as mobile devices or Internet of Things devices). Network quantization efficiently reduces storage space required by DCNNs. However, the performance of DCNNs often drops rapidly as the quantization bit reduces. In this article, we propose a space efficient quantization scheme which uses eight or less bits to represent the original 32-bit weights. We adopt singular value decomposition (SVD) method to decrease the parameter size of fully-connected layers for further compression. Additionally, we propose a weight clipping method based on dynamic boundary to improve the performance when using lower precision. Experimental results demonstrate that our approach can achieve up to approximately 14x compression while preserving almost the same accuracy compared with the full-precision models. The proposed weight clipping method can also significantly improve the performance of DCNNs when lower precision is required.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

References

Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 26th Annual Conf. Neural Information Processing Systems, December 2012, pp.1106-1114.
Ren S, He K, Girshick R, Sun J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proc. the 29th Annual Conf. Neural Information Processing Systems, December 2015, pp.91-99.
Abdel-Hamid O, Mohamed A R, Jiang H, Deng L, Penn G, Yu D. Convolutional neural networks for speech recognition. IEEE/ACM Trans. Audio, Speech, and Language processing, 2014, 22(10): 1533-1545.
Article Google Scholar
Mao H, Alizadeh M, Menache I, Kandula S. Resource management with deep reinforcement learning. In Proc. the 15th ACM Workshop on Hot Topics in Networks, November 2016, pp.50-56.
Deng J, Dong W, Socher R, Li L J, Li K, Li F F. ImageNet: A large-scale hierarchical image database. In Proc. the 2009 IEEE Computer Society Conf. Computer Vision and Pattern Recognition, June 2009, pp.248-255.
He K, Shang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.770-778.
Yao S, Hu S, Zhao Y, Zhang A, Abdelzaher T. DeepSense: A unified deep learning framework for time-series mobile sensing data processing. In Proc. the 26th International Conference on World Wide Web, April 2017, pp.351-360.
Guo B, Wang Z, Yu Z, Wang Y, Yen N, Huang R, Zhou X. Mobile crowd sensing and computing: The review of an emerging human-powered sensing paradigm. ACM Computing Surveys, 2015, 48(1): Article No. 7.
Vanhoucke V, Senior A, Mao M Z. Improving the speed of neural networks on CPUs. In Proc. NIPS Deep Learning and Unsupervised Feature Learning Workshop, December 2011, pp.611-620.
Han S, Mao H, Dally W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In Proc. Int. Conf. Learning Representations, May 2016, pp.351-360.
Gysel P, Motamedi M, Ghiasi S. Hardware-oriented approximation of convolutional neural networks. arXiv:1604.03168, 2016. https://arxiv.org/abs/1604.03168, October 2018.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014. https://arxiv.org/abs/1409.1556, April 2018.
Chen W, Wilson J T, Tyree S, Weinberger K Q, Chen Y. Compressing neural networks with the hashing trick. In Proc. the 32nd Int. Conf. Machine Learning, July 2015, pp.2285-2294.
Wu J, Leng C, Wang Y, Hu Q, Cheng J. Quantized convolutional neural networks for mobile devices. In Proc. the 2016 IEEE Conf. Computer Vision and Pattern Recognition, June 2016, pp.4820-4828.
Zhou A, Yao A, Guo Y, Xu L, Chen Y. Incremental network quantization: Towards lossless CNNs with low precision weights. arXiv:1702.03044, 2017. https://arxiv.org/abs/1702.03044, August 2017.
Park E, Ahn J, Yoo S. Weighted-entropy-based quantization for deep neural networks. In Proc. the 2017 IEEE Conf. Computer Vision and Pattern Recognition, July 2017, pp.7197-7205.
Jaderberg M, Vedaldi A, Zisserman A. Speeding up convolutional neural networks with low rank expansions. In Proc. British Machine Vision Conference, September 2014, Article No. 73.
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv:1503.02531, 2015. https://arxiv.org/pdf/1503.02531.pdf, November 2018.
Iandola F N, Han S, Moskewicz M W, Ashraf A, Dally W J, Keutzer K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv:1602.07360, 2016. https://arxiv.org/abs/1602.07360, November 2018.
Chollet F. Xception: Deep learning with depthwise separable convolutions. In Proc. the 2017 IEEE Conf. Computer Vision and Pattern Recognition, July 2017, pp.1800-1807.
Lin D, Talathi S, Annapureddy S. Fixed point quantization of deep convolutional networks. In Proc. the 33rd Int. Conf. Machine Learning, Jun. 2016, pp.2849-2858.
Gupta S, Argawal A, Gopalakrishnan K, Narayanan P. Deep learning with limited numerical precision. In Proc. the 32nd Int. Conf. Machine Learning, July 2015, pp.1737-1746.
Gong Y, Liu L, Yang M., Bourdev L. Compressing deep convolutional networks using vector quantization. arXiv:1412.6115, 2014. https://arxiv.org/abs/1412.6115, December 2018.
Kullback S, Leibler R A. On information and sufficiency. The Annals of Mathematical Statistics, 1951, 22(1): 79-86.
Article MathSciNet MATH Google Scholar
Abadi M, Barham P, Chen J, Chen Z et al. TensorFlow: A system for large-scale machine learning. In Proc. the 12th USENIX Symposium on Operating Systems Design and Implementation, November 2016, pp.265-283.

Download references

Author information

Authors and Affiliations

School of Computer Science, Beijing Institute of Technology, Beijing, 100081, China
Dong-Di Zhao, Fan Li, Kashif Sharif & Guang-Min Xia
Wireless Networking and Sensing Laboratory, Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC, 28223, U.S.A.
Yu Wang

Authors

Dong-Di Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Fan Li
View author publications
You can also search for this author in PubMed Google Scholar
Kashif Sharif
View author publications
You can also search for this author in PubMed Google Scholar
Guang-Min Xia
View author publications
You can also search for this author in PubMed Google Scholar
Yu Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Fan Li or Yu Wang.

Electronic supplementary material

ESM 1

(PDF 327 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, DD., Li, F., Sharif, K. et al. Space Efficient Quantization for Deep Convolutional Neural Networks. J. Comput. Sci. Technol. 34, 305–317 (2019). https://doi.org/10.1007/s11390-019-1912-1

Download citation

Received: 15 July 2018
Revised: 27 January 2019
Published: 22 March 2019
Issue Date: March 2019
DOI: https://doi.org/10.1007/s11390-019-1912-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Space Efficient Quantization for Deep Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

Compressed neural architecture utilizing dimensionality reduction and quantization

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Methodologies of Compressing a Stable Performance Convolutional Neural Networks in Image Classification

References

Author information

Authors and Affiliations

Corresponding authors

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Space Efficient Quantization for Deep Convolutional Neural Networks

Abstract

Access this article

Similar content being viewed by others

Compressed neural architecture utilizing dimensionality reduction and quantization

LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks

Methodologies of Compressing a Stable Performance Convolutional Neural Networks in Image Classification

References

Author information

Authors and Affiliations

Corresponding authors

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation