Tensor-Solver for Deep Neural Network

Huang, Hantao; Yu, Hao

doi:10.1007/978-981-13-3323-1_4

Hantao Huang⁷ &
Hao Yu⁸

Part of the book series: Computer Architecture and Design Methodologies ((CADM))

904 Accesses

Abstract

This chapter introduces a tensorized formulation for compressing neural network during training. By reshaping neural network weight matrices into high dimensional tensors with low-rank decomposition, significant neural network compression can be achieved with maintained accuracy. A layer-wise training algorithm of tensorized multilayer neural network is further introduced by modified alternating least-squares (MALS) method. The proposed TNN algorithm can provide state-of-the-arts results on various benchmarks with significant neural network compression rate. The accuracy can be further improved by fine-tuning with backward propagation (BP). Significant compression rate can be achieved for MNIST dataset and CIFAR-10 dataset. In addition, a 3D multi-layer CMOS-RRAM accelerator architecture is proposed for energy-efficient and highly-parallel computation (Figures and illustrations may be reproduced from [29,30,31]).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 109.00; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Stride is assumed 1 and there is no padding on the input data.
2.
Diagrammatic notation of tensors is detailed in [27].
3.
Provided the loss function is euclidean distance.
4.
The improvement of accuracy is mainly due to the increased rank value since both tensor-train and quantization techniques are applied to maintain \(64 \times \) compression rate.

References

Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: OSDI, vol 16, p 265–283
Google Scholar
Amor BB, Su J, Srivastava A (2016) Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans Pattern Anal Mach Intell 38(1):1–13
Article Google Scholar
Annadani Y, Rakshith D, Biswas S (2016) Sliding dictionary based sparse representation for action recognition. arXiv:161100218
Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153
Google Scholar
Chen K, Li S, Muralimanohar N, Ahn JH, Brockman JB, Jouppi NP (2012) CACTI-3DD: architecture-level modeling for 3D die-stacked DRAM main memory. In: Proceedings of the conference on design, automation and test in Europe, Dresden, Germany, pp 33–38
Google Scholar
Chen PY, Kadetotad D, Xu Z, Mohanty A, Lin B, Ye J, Vrudhula S, Seo JS, Cao Y, Yu S (2015) Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip. In: IEEE Proceedings of the 2015 design, automation and test in Europe conference and exhibition, EDA consortium, pp 854–859
Google Scholar
Chen W, Wilson JT, Tyree S, Weinberger KQ, Chen Y (2015) Compressing neural networks with the hashing trick. In: International conference on machine learning, Lille, France, pp 2285–2294
Google Scholar
Chen YC, Wang W, Li H, Zhang W (2012) Non-volatile 3D stacking RRAM-based FPGA. In: IEEE international conference on field programmable logic and applications, Oslo, Norway
Google Scholar
Cichocki A (2014) Era of big data processing: a new approach via tensor networks and tensor decompositions. arXiv:14032048
Cireşan DC, Meier U, Masci J, Gambardella LM, Schmidhuber J (2011) High-performance neural networks for visual object classification. arXiv:11020183
Collins MD, Kohli P (2014) Memory bounded deep convolutional networks. arXiv:14121442
Davis A, Arel I (2013) Low-rank approximations for conditional feedforward computation in deep neural networks. arXiv:13124461
Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV et al (2012) Large scale distributed deep networks. In: Advances in neural information processing systems, Lake Tahoe, Nevada, pp 1223–1231
Google Scholar
Deng J, Berg A, Satheesh S, Su H, Khosla A, Fei-Fei L (2012) Imagenet large scale visual recognition competition 2012 (ILSVRC2012)
Google Scholar
Denil M, Shakibi B, Dinh L, de Freitas N et al (2013) Predicting parameters in deep learning. In: Advances in neural information processing systems, Lake Tahoe, Nevada, pp 2148–2156
Google Scholar
Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, Montreal, Canada, pp 1269–1277
Google Scholar
Fei W, Yu H, Zhang W, Yeo KS (2012) Design exploration of hybrid CMOS and memristor circuit by new modified nodal analysis. IEEE Trans Very Large Scale Integr (VLSI) Syst 20(6):1012–1025
Article Google Scholar
Fothergill S, Mentis H, Kohli P, Nowozin S (2012) Instructing people for training gestural interactive systems. In: Proceedings of the SIGCHI conference on human factors in computing systems, Austin, Texas, pp 1737–1746
Google Scholar
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics, Sardinia, Italy, pp 249–256
Google Scholar
Govoreanu B, Kar G, Chen Y, Paraschiv V, Kubicek S, Fantini A, Radu I, Goux L, Clima S, Degraeve R et al (2011) 10 \(\times \) 10 nm 2 Hf/HfO x crossbar resistive RAM with excellent performance, reliability and low-energy operation. In: International electron devices meeting, Washington, DC, pp 31–36
Google Scholar
Hagan MT, Demuth HB, Beale MH, De Jesús O (1996) Neural network design, vol 20. PWS Publishing Company, Boston
Google Scholar
Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv:151000149
Han S et al (2016) DSD: regularizing deep neural networks with dense-sparse-dense training flow. arXiv:160704381
Han S et al (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: International symposium on field-programmable gate arrays, Monterey, California, pp 75–84
Google Scholar
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:150302531
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Article MathSciNet Google Scholar
Holtz S, Rohwedder T, Schneider R (2012) The alternating linear scheme for tensor optimization in the tensor train format. SIAM J Sci Comput 34(2):A683–A713
Article MathSciNet Google Scholar
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Article Google Scholar
Huang H, Yu H (2018) LTNN: a layer-wise tensorized compression of multilayer neural network. IEEE Trans Neural Netw Learn Syst https://doi.org/10.1109/TNNLS.2018.2869974
Huang H, Ni L, Yu H (2017) LTNN: an energy-efficient machine learning accelerator on 3d CMOS-RRAM for layer-wise tensorized neural network. In: 2017 30th IEEE international system-on-chip conference (SOCC). IEEE, pp 280–285
Google Scholar
Huang H, Ni L, Wang K, Wang Y, Yu H (2018) A highly parallel and energy efficient three-dimensional multilayer CMOS-RRAM accelerator for tensorized neural network. IEEE Trans Nanotechnol 17(4):645–656. https://doi.org/10.1109/TNANO.2017.2732698
Article Google Scholar
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Quantized neural networks: training neural networks with low precision weights and activations. arXiv:160907061
Hubara I, Soudry D, Yaniv RE (2016) Binarized neural networks. arXiv:160202505
Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: International joint conference on artificial intelligence, Menlo Park, California, pp 2466–2472
Google Scholar
Kasun LLC, Yang Y, Huang GB, Zhang Z (2016) Dimension reduction with extreme learning machine. IEEE Trans Image Process 25(8):3906–3918
Article MathSciNet Google Scholar
Kingma DP, Ba J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980
Krizhevsky A, Nair V, Hinton G (2014) The cifar-10 dataset
Google Scholar
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Google Scholar
LeCun Y, Cortes C, Burges CJ (1998) The MNIST database of handwritten digits
Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Article Google Scholar
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Computer vision and pattern recognition workshops, San Francisco, California, pp 9–14
Google Scholar
Liauw YY, Zhang Z, Kim W, El Gamal A, Wong SS (2012) Nonvolatile 3D-FPGA with monolithically stacked RRAM-based configuration memory. In: IEEE international solid-state circuits conference, San Francisco, California
Google Scholar
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Liu Z, Li Y, Ren F, Yu H (2016) A binary convolutional encoder-decoder network for real-time natural scene text processing. arXiv:161203630
Martens J, Sutskever I (2011) Learning recurrent neural networks with Hessian-free optimization. In: International conference on machine learning, Bellevue, Washington, pp 1033–1040
Google Scholar
Mellempudi N, Kundu A, Mudigere D, Das D, Kaul B, Dubey P (2017) Ternary neural networks with fine-grained quantization. arXiv:170501462
Micron Technology I (2017) Breakthrough nonvolatile memory technology. http://www.micron.com/about/emerging-technologies/3d-xpoint-technology/. Accessed 04 Jan 2018
Migacz S (2017) 8-bit inference with tensorrt. http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf. Accessed 04 Jan 2018
Ni L, Huang H, Liu Z, Joshi RV, Yu H (2017) Distributed in-memory computing on binary RRAM crossbar. ACM J Emerg Technol Comput Syst (JETC) 13(3):36. https://doi.org/10.1145/2996192
Article Google Scholar
Novikov A, Podoprikhin D, Osokin A, Vetrov DP (2015) Tensorizing neural networks. In: Advances in neural information processing systems, Montreal, Canada, pp 442–450
Google Scholar
Nvidia (2017) GPU specs. http://www.nvidia.com/object/workstation-solutions.html. Accessed 30 Mar 2017
Oseledets IV (2011) Tensor-train decomposition. SIAM J Sci Comput 33(5):2295–2317
Article MathSciNet Google Scholar
Oseledets IV, Dolgov S (2012) Solution of linear systems and matrix inversion in the TT-format. SIAM J Sci Comput 34(5):A2718–A2739
Article MathSciNet Google Scholar
Poremba M, Mittal S, Li D, Vetter JS, Xie Y (2015) DESTINY: a tool for modeling emerging 3D NVM and eDRAM caches. In: Design, automation and test in Europe conference, Grenoble, France, pp 1543–1546
Google Scholar
Rosenberg A (2009) Linear regression with regularization. http://eniac.cs.qc.cuny.edu/andrew/gcml/lecture5.pdf
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556
Tang J, Deng C, Huang GB (2016) Extreme learning machine for multilayer perceptron. IEEE Trans Neural Netw Learn Syst 27(4):809–821
Article MathSciNet Google Scholar
Topaloglu RO (2015) More than Moore technologies for next generation computer design. Springer, Berlin
Google Scholar
Vedaldi A, Lenc K (2015) Matconvnet: convolutional neural networks for MATLAB. In: International conference on multimedia, Brisbane, Australia, pp 689–692
Google Scholar
Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3D action recognition with random occupancy patterns. In: Computer vision (ECCV). Springer, pp 872–885
Google Scholar
Wang P, Li Z, Hou Y, Li W (2016) Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM multimedia conference, Amsterdam, The Netherlands, pp 102–106
Google Scholar
Wang Y, Zhang C, Nadipalli R, Yu H, Weerasekera R (2012) Design exploration of 3D stacked non-volatile memory by conductive bridge based crossbar. In: IEEE international conference on 3D system integration, Osaka, Japan
Google Scholar
Wang Y, Yu H, Zhang W (2014) Nonvolatile CBRAM-crossbar-based 3D-integrated hybrid memory for data retention. IEEE Trans Very Large Scale Integr (VLSI) Syst 22(5):957–970
Article Google Scholar
Wang Y, Huang H, Ni L, Yu H, Yan M, Weng C, Yang W, Zhao J (2015) An energy-efficient non-volatile in-memory accelerator for sparse-representation based face recognition. In: Design, automation and test in Europe conference and exhibition (DATE), 2015. IEEE, pp 932–935
Google Scholar
Wang Y, Li X, Xu K, Ren F, Yu H (2017) Data-driven sampling matrix boolean optimization for energy-efficient biomedical signal acquisition by compressive sensing. IEEE Trans Biomed Circuits Syst 11(2):255–266
Article Google Scholar
Xia L, Chen CC, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints. In: Computer vision and pattern recognition workshops, Providence, Rhode Island, pp 20–27
Google Scholar
Xue J, Li J, Gong Y (2013) Restructuring of deep neural network acoustic models with singular value decomposition. In: Annual conference of the international speech communication association, Lyon, France, pp 2365–2369
Google Scholar
Yang S, Yuan C, Hu W, Ding X (2014) A hierarchical model based on latent Dirichlet allocation for action recognition. In: International conference on pattern recognition, Stockholm, Sweden, pp 2613–2618
Google Scholar
Yang Z, Moczulski M, Denil M, de Freitas N, Smola A, Song L, Wang Z (2015) Deep fried convnets. In: IEEE international conference on computer vision, Santiago, Chile, pp 1476–1483
Google Scholar
Yu S et al (2013) 3D vertical RRAM-scaling limit analysis and demonstration of 3D array operation. In: Symposium on VLSI technology and circuits, Kyoto, Japan, pp 158–159
Google Scholar
Zhou L, Li W, Zhang Y, Ogunbona P, Nguyen DT, Zhang H (2014) Discriminative key pose extraction using extended LC-KSVD for action recognition. In: International conference on digital image computing: techniques and applications, New South Wales, Australia, pp 1–8
Google Scholar

Download references

Author information

Authors and Affiliations

School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Hantao Huang
Department of Electrical and Electronic Engineering, Southern University of Science and Technology, Shenzhen, Guangdong, China
Hao Yu

Authors

Hantao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hao Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hantao Huang .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Huang, H., Yu, H. (2019). Tensor-Solver for Deep Neural Network. In: Compact and Fast Machine Learning Accelerator for IoT Devices. Computer Architecture and Design Methodologies. Springer, Singapore. https://doi.org/10.1007/978-981-13-3323-1_4

Download citation

DOI: https://doi.org/10.1007/978-981-13-3323-1_4
Published: 08 December 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3322-4
Online ISBN: 978-981-13-3323-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics