Abstract
This chapter introduces a tensorized formulation for compressing neural network during training. By reshaping neural network weight matrices into high dimensional tensors with low-rank decomposition, significant neural network compression can be achieved with maintained accuracy. A layer-wise training algorithm of tensorized multilayer neural network is further introduced by modified alternating least-squares (MALS) method. The proposed TNN algorithm can provide state-of-the-arts results on various benchmarks with significant neural network compression rate. The accuracy can be further improved by fine-tuning with backward propagation (BP). Significant compression rate can be achieved for MNIST dataset and CIFAR-10 dataset. In addition, a 3D multi-layer CMOS-RRAM accelerator architecture is proposed for energy-efficient and highly-parallel computation (Figures and illustrations may be reproduced from [29,30,31]).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Stride is assumed 1 and there is no padding on the input data.
- 2.
Diagrammatic notation of tensors is detailed in [27].
- 3.
Provided the loss function is euclidean distance.
- 4.
The improvement of accuracy is mainly due to the increased rank value since both tensor-train and quantization techniques are applied to maintain \(64 \times \) compression rate.
References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: OSDI, vol 16, p 265–283
Amor BB, Su J, Srivastava A (2016) Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans Pattern Anal Mach Intell 38(1):1–13
Annadani Y, Rakshith D, Biswas S (2016) Sliding dictionary based sparse representation for action recognition. arXiv:161100218
Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153
Chen K, Li S, Muralimanohar N, Ahn JH, Brockman JB, Jouppi NP (2012) CACTI-3DD: architecture-level modeling for 3D die-stacked DRAM main memory. In: Proceedings of the conference on design, automation and test in Europe, Dresden, Germany, pp 33–38
Chen PY, Kadetotad D, Xu Z, Mohanty A, Lin B, Ye J, Vrudhula S, Seo JS, Cao Y, Yu S (2015) Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip. In: IEEE Proceedings of the 2015 design, automation and test in Europe conference and exhibition, EDA consortium, pp 854–859
Chen W, Wilson JT, Tyree S, Weinberger KQ, Chen Y (2015) Compressing neural networks with the hashing trick. In: International conference on machine learning, Lille, France, pp 2285–2294
Chen YC, Wang W, Li H, Zhang W (2012) Non-volatile 3D stacking RRAM-based FPGA. In: IEEE international conference on field programmable logic and applications, Oslo, Norway
Cichocki A (2014) Era of big data processing: a new approach via tensor networks and tensor decompositions. arXiv:14032048
Cireşan DC, Meier U, Masci J, Gambardella LM, Schmidhuber J (2011) High-performance neural networks for visual object classification. arXiv:11020183
Collins MD, Kohli P (2014) Memory bounded deep convolutional networks. arXiv:14121442
Davis A, Arel I (2013) Low-rank approximations for conditional feedforward computation in deep neural networks. arXiv:13124461
Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV et al (2012) Large scale distributed deep networks. In: Advances in neural information processing systems, Lake Tahoe, Nevada, pp 1223–1231
Deng J, Berg A, Satheesh S, Su H, Khosla A, Fei-Fei L (2012) Imagenet large scale visual recognition competition 2012 (ILSVRC2012)
Denil M, Shakibi B, Dinh L, de Freitas N et al (2013) Predicting parameters in deep learning. In: Advances in neural information processing systems, Lake Tahoe, Nevada, pp 2148–2156
Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, Montreal, Canada, pp 1269–1277
Fei W, Yu H, Zhang W, Yeo KS (2012) Design exploration of hybrid CMOS and memristor circuit by new modified nodal analysis. IEEE Trans Very Large Scale Integr (VLSI) Syst 20(6):1012–1025
Fothergill S, Mentis H, Kohli P, Nowozin S (2012) Instructing people for training gestural interactive systems. In: Proceedings of the SIGCHI conference on human factors in computing systems, Austin, Texas, pp 1737–1746
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics, Sardinia, Italy, pp 249–256
Govoreanu B, Kar G, Chen Y, Paraschiv V, Kubicek S, Fantini A, Radu I, Goux L, Clima S, Degraeve R et al (2011) 10 \(\times \) 10 nm 2 Hf/HfO x crossbar resistive RAM with excellent performance, reliability and low-energy operation. In: International electron devices meeting, Washington, DC, pp 31–36
Hagan MT, Demuth HB, Beale MH, De Jesús O (1996) Neural network design, vol 20. PWS Publishing Company, Boston
Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv:151000149
Han S et al (2016) DSD: regularizing deep neural networks with dense-sparse-dense training flow. arXiv:160704381
Han S et al (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: International symposium on field-programmable gate arrays, Monterey, California, pp 75–84
Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:150302531
Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554
Holtz S, Rohwedder T, Schneider R (2012) The alternating linear scheme for tensor optimization in the tensor train format. SIAM J Sci Comput 34(2):A683–A713
Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Huang H, Yu H (2018) LTNN: a layer-wise tensorized compression of multilayer neural network. IEEE Trans Neural Netw Learn Syst https://doi.org/10.1109/TNNLS.2018.2869974
Huang H, Ni L, Yu H (2017) LTNN: an energy-efficient machine learning accelerator on 3d CMOS-RRAM for layer-wise tensorized neural network. In: 2017 30th IEEE international system-on-chip conference (SOCC). IEEE, pp 280–285
Huang H, Ni L, Wang K, Wang Y, Yu H (2018) A highly parallel and energy efficient three-dimensional multilayer CMOS-RRAM accelerator for tensorized neural network. IEEE Trans Nanotechnol 17(4):645–656. https://doi.org/10.1109/TNANO.2017.2732698
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Quantized neural networks: training neural networks with low precision weights and activations. arXiv:160907061
Hubara I, Soudry D, Yaniv RE (2016) Binarized neural networks. arXiv:160202505
Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: International joint conference on artificial intelligence, Menlo Park, California, pp 2466–2472
Kasun LLC, Yang Y, Huang GB, Zhang Z (2016) Dimension reduction with extreme learning machine. IEEE Trans Image Process 25(8):3906–3918
Kingma DP, Ba J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980
Krizhevsky A, Nair V, Hinton G (2014) The cifar-10 dataset
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
LeCun Y, Cortes C, Burges CJ (1998) The MNIST database of handwritten digits
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Computer vision and pattern recognition workshops, San Francisco, California, pp 9–14
Liauw YY, Zhang Z, Kim W, El Gamal A, Wong SS (2012) Nonvolatile 3D-FPGA with monolithically stacked RRAM-based configuration memory. In: IEEE international solid-state circuits conference, San Francisco, California
Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml
Liu Z, Li Y, Ren F, Yu H (2016) A binary convolutional encoder-decoder network for real-time natural scene text processing. arXiv:161203630
Martens J, Sutskever I (2011) Learning recurrent neural networks with Hessian-free optimization. In: International conference on machine learning, Bellevue, Washington, pp 1033–1040
Mellempudi N, Kundu A, Mudigere D, Das D, Kaul B, Dubey P (2017) Ternary neural networks with fine-grained quantization. arXiv:170501462
Micron Technology I (2017) Breakthrough nonvolatile memory technology. http://www.micron.com/about/emerging-technologies/3d-xpoint-technology/. Accessed 04 Jan 2018
Migacz S (2017) 8-bit inference with tensorrt. http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf. Accessed 04 Jan 2018
Ni L, Huang H, Liu Z, Joshi RV, Yu H (2017) Distributed in-memory computing on binary RRAM crossbar. ACM J Emerg Technol Comput Syst (JETC) 13(3):36. https://doi.org/10.1145/2996192
Novikov A, Podoprikhin D, Osokin A, Vetrov DP (2015) Tensorizing neural networks. In: Advances in neural information processing systems, Montreal, Canada, pp 442–450
Nvidia (2017) GPU specs. http://www.nvidia.com/object/workstation-solutions.html. Accessed 30 Mar 2017
Oseledets IV (2011) Tensor-train decomposition. SIAM J Sci Comput 33(5):2295–2317
Oseledets IV, Dolgov S (2012) Solution of linear systems and matrix inversion in the TT-format. SIAM J Sci Comput 34(5):A2718–A2739
Poremba M, Mittal S, Li D, Vetter JS, Xie Y (2015) DESTINY: a tool for modeling emerging 3D NVM and eDRAM caches. In: Design, automation and test in Europe conference, Grenoble, France, pp 1543–1546
Rosenberg A (2009) Linear regression with regularization. http://eniac.cs.qc.cuny.edu/andrew/gcml/lecture5.pdf
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556
Tang J, Deng C, Huang GB (2016) Extreme learning machine for multilayer perceptron. IEEE Trans Neural Netw Learn Syst 27(4):809–821
Topaloglu RO (2015) More than Moore technologies for next generation computer design. Springer, Berlin
Vedaldi A, Lenc K (2015) Matconvnet: convolutional neural networks for MATLAB. In: International conference on multimedia, Brisbane, Australia, pp 689–692
Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3D action recognition with random occupancy patterns. In: Computer vision (ECCV). Springer, pp 872–885
Wang P, Li Z, Hou Y, Li W (2016) Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM multimedia conference, Amsterdam, The Netherlands, pp 102–106
Wang Y, Zhang C, Nadipalli R, Yu H, Weerasekera R (2012) Design exploration of 3D stacked non-volatile memory by conductive bridge based crossbar. In: IEEE international conference on 3D system integration, Osaka, Japan
Wang Y, Yu H, Zhang W (2014) Nonvolatile CBRAM-crossbar-based 3D-integrated hybrid memory for data retention. IEEE Trans Very Large Scale Integr (VLSI) Syst 22(5):957–970
Wang Y, Huang H, Ni L, Yu H, Yan M, Weng C, Yang W, Zhao J (2015) An energy-efficient non-volatile in-memory accelerator for sparse-representation based face recognition. In: Design, automation and test in Europe conference and exhibition (DATE), 2015. IEEE, pp 932–935
Wang Y, Li X, Xu K, Ren F, Yu H (2017) Data-driven sampling matrix boolean optimization for energy-efficient biomedical signal acquisition by compressive sensing. IEEE Trans Biomed Circuits Syst 11(2):255–266
Xia L, Chen CC, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints. In: Computer vision and pattern recognition workshops, Providence, Rhode Island, pp 20–27
Xue J, Li J, Gong Y (2013) Restructuring of deep neural network acoustic models with singular value decomposition. In: Annual conference of the international speech communication association, Lyon, France, pp 2365–2369
Yang S, Yuan C, Hu W, Ding X (2014) A hierarchical model based on latent Dirichlet allocation for action recognition. In: International conference on pattern recognition, Stockholm, Sweden, pp 2613–2618
Yang Z, Moczulski M, Denil M, de Freitas N, Smola A, Song L, Wang Z (2015) Deep fried convnets. In: IEEE international conference on computer vision, Santiago, Chile, pp 1476–1483
Yu S et al (2013) 3D vertical RRAM-scaling limit analysis and demonstration of 3D array operation. In: Symposium on VLSI technology and circuits, Kyoto, Japan, pp 158–159
Zhou L, Li W, Zhang Y, Ogunbona P, Nguyen DT, Zhang H (2014) Discriminative key pose extraction using extended LC-KSVD for action recognition. In: International conference on digital image computing: techniques and applications, New South Wales, Australia, pp 1–8
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Huang, H., Yu, H. (2019). Tensor-Solver for Deep Neural Network. In: Compact and Fast Machine Learning Accelerator for IoT Devices. Computer Architecture and Design Methodologies. Springer, Singapore. https://doi.org/10.1007/978-981-13-3323-1_4
Download citation
DOI: https://doi.org/10.1007/978-981-13-3323-1_4
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3322-4
Online ISBN: 978-981-13-3323-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)