Skip to main content

Tensor-Solver for Deep Neural Network

  • Chapter
  • First Online:
Compact and Fast Machine Learning Accelerator for IoT Devices

Part of the book series: Computer Architecture and Design Methodologies ((CADM))

  • 904 Accesses

Abstract

This chapter introduces a tensorized formulation for compressing neural network during training. By reshaping neural network weight matrices into high dimensional tensors with low-rank decomposition, significant neural network compression can be achieved with maintained accuracy. A layer-wise training algorithm of tensorized multilayer neural network is further introduced by modified alternating least-squares (MALS) method. The proposed TNN algorithm can provide state-of-the-arts results on various benchmarks with significant neural network compression rate. The accuracy can be further improved by fine-tuning with backward propagation (BP). Significant compression rate can be achieved for MNIST dataset and CIFAR-10 dataset. In addition, a 3D multi-layer CMOS-RRAM accelerator architecture is proposed for energy-efficient and highly-parallel computation (Figures and illustrations may be reproduced from [29,30,31]).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Stride is assumed 1 and there is no padding on the input data.

  2. 2.

    Diagrammatic notation of tensors is detailed in [27].

  3. 3.

    Provided the loss function is euclidean distance.

  4. 4.

    The improvement of accuracy is mainly due to the increased rank value since both tensor-train and quantization techniques are applied to maintain \(64 \times \) compression rate.

References

  1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: OSDI, vol 16, p 265–283

    Google Scholar 

  2. Amor BB, Su J, Srivastava A (2016) Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans Pattern Anal Mach Intell 38(1):1–13

    Article  Google Scholar 

  3. Annadani Y, Rakshith D, Biswas S (2016) Sliding dictionary based sparse representation for action recognition. arXiv:161100218

  4. Bengio Y, Lamblin P, Popovici D, Larochelle H et al (2007) Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst 19:153

    Google Scholar 

  5. Chen K, Li S, Muralimanohar N, Ahn JH, Brockman JB, Jouppi NP (2012) CACTI-3DD: architecture-level modeling for 3D die-stacked DRAM main memory. In: Proceedings of the conference on design, automation and test in Europe, Dresden, Germany, pp 33–38

    Google Scholar 

  6. Chen PY, Kadetotad D, Xu Z, Mohanty A, Lin B, Ye J, Vrudhula S, Seo JS, Cao Y, Yu S (2015) Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip. In: IEEE Proceedings of the 2015 design, automation and test in Europe conference and exhibition, EDA consortium, pp 854–859

    Google Scholar 

  7. Chen W, Wilson JT, Tyree S, Weinberger KQ, Chen Y (2015) Compressing neural networks with the hashing trick. In: International conference on machine learning, Lille, France, pp 2285–2294

    Google Scholar 

  8. Chen YC, Wang W, Li H, Zhang W (2012) Non-volatile 3D stacking RRAM-based FPGA. In: IEEE international conference on field programmable logic and applications, Oslo, Norway

    Google Scholar 

  9. Cichocki A (2014) Era of big data processing: a new approach via tensor networks and tensor decompositions. arXiv:14032048

  10. Cireşan DC, Meier U, Masci J, Gambardella LM, Schmidhuber J (2011) High-performance neural networks for visual object classification. arXiv:11020183

  11. Collins MD, Kohli P (2014) Memory bounded deep convolutional networks. arXiv:14121442

  12. Davis A, Arel I (2013) Low-rank approximations for conditional feedforward computation in deep neural networks. arXiv:13124461

  13. Dean J, Corrado G, Monga R, Chen K, Devin M, Mao M, Senior A, Tucker P, Yang K, Le QV et al (2012) Large scale distributed deep networks. In: Advances in neural information processing systems, Lake Tahoe, Nevada, pp 1223–1231

    Google Scholar 

  14. Deng J, Berg A, Satheesh S, Su H, Khosla A, Fei-Fei L (2012) Imagenet large scale visual recognition competition 2012 (ILSVRC2012)

    Google Scholar 

  15. Denil M, Shakibi B, Dinh L, de Freitas N et al (2013) Predicting parameters in deep learning. In: Advances in neural information processing systems, Lake Tahoe, Nevada, pp 2148–2156

    Google Scholar 

  16. Denton EL, Zaremba W, Bruna J, LeCun Y, Fergus R (2014) Exploiting linear structure within convolutional networks for efficient evaluation. In: Advances in neural information processing systems, Montreal, Canada, pp 1269–1277

    Google Scholar 

  17. Fei W, Yu H, Zhang W, Yeo KS (2012) Design exploration of hybrid CMOS and memristor circuit by new modified nodal analysis. IEEE Trans Very Large Scale Integr (VLSI) Syst 20(6):1012–1025

    Article  Google Scholar 

  18. Fothergill S, Mentis H, Kohli P, Nowozin S (2012) Instructing people for training gestural interactive systems. In: Proceedings of the SIGCHI conference on human factors in computing systems, Austin, Texas, pp 1737–1746

    Google Scholar 

  19. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics, Sardinia, Italy, pp 249–256

    Google Scholar 

  20. Govoreanu B, Kar G, Chen Y, Paraschiv V, Kubicek S, Fantini A, Radu I, Goux L, Clima S, Degraeve R et al (2011) 10 \(\times \) 10 nm 2 Hf/HfO x crossbar resistive RAM with excellent performance, reliability and low-energy operation. In: International electron devices meeting, Washington, DC, pp 31–36

    Google Scholar 

  21. Hagan MT, Demuth HB, Beale MH, De Jesús O (1996) Neural network design, vol 20. PWS Publishing Company, Boston

    Google Scholar 

  22. Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv:151000149

  23. Han S et al (2016) DSD: regularizing deep neural networks with dense-sparse-dense training flow. arXiv:160704381

  24. Han S et al (2017) ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: International symposium on field-programmable gate arrays, Monterey, California, pp 75–84

    Google Scholar 

  25. Hinton G, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. arXiv:150302531

  26. Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18(7):1527–1554

    Article  MathSciNet  Google Scholar 

  27. Holtz S, Rohwedder T, Schneider R (2012) The alternating linear scheme for tensor optimization in the tensor train format. SIAM J Sci Comput 34(2):A683–A713

    Article  MathSciNet  Google Scholar 

  28. Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501

    Article  Google Scholar 

  29. Huang H, Yu H (2018) LTNN: a layer-wise tensorized compression of multilayer neural network. IEEE Trans Neural Netw Learn Syst https://doi.org/10.1109/TNNLS.2018.2869974

  30. Huang H, Ni L, Yu H (2017) LTNN: an energy-efficient machine learning accelerator on 3d CMOS-RRAM for layer-wise tensorized neural network. In: 2017 30th IEEE international system-on-chip conference (SOCC). IEEE, pp 280–285

    Google Scholar 

  31. Huang H, Ni L, Wang K, Wang Y, Yu H (2018) A highly parallel and energy efficient three-dimensional multilayer CMOS-RRAM accelerator for tensorized neural network. IEEE Trans Nanotechnol 17(4):645–656. https://doi.org/10.1109/TNANO.2017.2732698

    Article  Google Scholar 

  32. Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Quantized neural networks: training neural networks with low precision weights and activations. arXiv:160907061

  33. Hubara I, Soudry D, Yaniv RE (2016) Binarized neural networks. arXiv:160202505

  34. Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: International joint conference on artificial intelligence, Menlo Park, California, pp 2466–2472

    Google Scholar 

  35. Kasun LLC, Yang Y, Huang GB, Zhang Z (2016) Dimension reduction with extreme learning machine. IEEE Trans Image Process 25(8):3906–3918

    Article  MathSciNet  Google Scholar 

  36. Kingma DP, Ba J. (2014). Adam: A method for stochastic optimization. arXiv:1412.6980

  37. Krizhevsky A, Nair V, Hinton G (2014) The cifar-10 dataset

    Google Scholar 

  38. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Google Scholar 

  39. LeCun Y, Cortes C, Burges CJ (1998) The MNIST database of handwritten digits

    Google Scholar 

  40. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444

    Article  Google Scholar 

  41. Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Computer vision and pattern recognition workshops, San Francisco, California, pp 9–14

    Google Scholar 

  42. Liauw YY, Zhang Z, Kim W, El Gamal A, Wong SS (2012) Nonvolatile 3D-FPGA with monolithically stacked RRAM-based configuration memory. In: IEEE international solid-state circuits conference, San Francisco, California

    Google Scholar 

  43. Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  44. Liu Z, Li Y, Ren F, Yu H (2016) A binary convolutional encoder-decoder network for real-time natural scene text processing. arXiv:161203630

  45. Martens J, Sutskever I (2011) Learning recurrent neural networks with Hessian-free optimization. In: International conference on machine learning, Bellevue, Washington, pp 1033–1040

    Google Scholar 

  46. Mellempudi N, Kundu A, Mudigere D, Das D, Kaul B, Dubey P (2017) Ternary neural networks with fine-grained quantization. arXiv:170501462

  47. Micron Technology I (2017) Breakthrough nonvolatile memory technology. http://www.micron.com/about/emerging-technologies/3d-xpoint-technology/. Accessed 04 Jan 2018

  48. Migacz S (2017) 8-bit inference with tensorrt. http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf. Accessed 04 Jan 2018

  49. Ni L, Huang H, Liu Z, Joshi RV, Yu H (2017) Distributed in-memory computing on binary RRAM crossbar. ACM J Emerg Technol Comput Syst (JETC) 13(3):36. https://doi.org/10.1145/2996192

    Article  Google Scholar 

  50. Novikov A, Podoprikhin D, Osokin A, Vetrov DP (2015) Tensorizing neural networks. In: Advances in neural information processing systems, Montreal, Canada, pp 442–450

    Google Scholar 

  51. Nvidia (2017) GPU specs. http://www.nvidia.com/object/workstation-solutions.html. Accessed 30 Mar 2017

  52. Oseledets IV (2011) Tensor-train decomposition. SIAM J Sci Comput 33(5):2295–2317

    Article  MathSciNet  Google Scholar 

  53. Oseledets IV, Dolgov S (2012) Solution of linear systems and matrix inversion in the TT-format. SIAM J Sci Comput 34(5):A2718–A2739

    Article  MathSciNet  Google Scholar 

  54. Poremba M, Mittal S, Li D, Vetter JS, Xie Y (2015) DESTINY: a tool for modeling emerging 3D NVM and eDRAM caches. In: Design, automation and test in Europe conference, Grenoble, France, pp 1543–1546

    Google Scholar 

  55. Rosenberg A (2009) Linear regression with regularization. http://eniac.cs.qc.cuny.edu/andrew/gcml/lecture5.pdf

  56. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556

  57. Tang J, Deng C, Huang GB (2016) Extreme learning machine for multilayer perceptron. IEEE Trans Neural Netw Learn Syst 27(4):809–821

    Article  MathSciNet  Google Scholar 

  58. Topaloglu RO (2015) More than Moore technologies for next generation computer design. Springer, Berlin

    Google Scholar 

  59. Vedaldi A, Lenc K (2015) Matconvnet: convolutional neural networks for MATLAB. In: International conference on multimedia, Brisbane, Australia, pp 689–692

    Google Scholar 

  60. Wang J, Liu Z, Chorowski J, Chen Z, Wu Y (2012) Robust 3D action recognition with random occupancy patterns. In: Computer vision (ECCV). Springer, pp 872–885

    Google Scholar 

  61. Wang P, Li Z, Hou Y, Li W (2016) Action recognition based on joint trajectory maps using convolutional neural networks. In: ACM multimedia conference, Amsterdam, The Netherlands, pp 102–106

    Google Scholar 

  62. Wang Y, Zhang C, Nadipalli R, Yu H, Weerasekera R (2012) Design exploration of 3D stacked non-volatile memory by conductive bridge based crossbar. In: IEEE international conference on 3D system integration, Osaka, Japan

    Google Scholar 

  63. Wang Y, Yu H, Zhang W (2014) Nonvolatile CBRAM-crossbar-based 3D-integrated hybrid memory for data retention. IEEE Trans Very Large Scale Integr (VLSI) Syst 22(5):957–970

    Article  Google Scholar 

  64. Wang Y, Huang H, Ni L, Yu H, Yan M, Weng C, Yang W, Zhao J (2015) An energy-efficient non-volatile in-memory accelerator for sparse-representation based face recognition. In: Design, automation and test in Europe conference and exhibition (DATE), 2015. IEEE, pp 932–935

    Google Scholar 

  65. Wang Y, Li X, Xu K, Ren F, Yu H (2017) Data-driven sampling matrix boolean optimization for energy-efficient biomedical signal acquisition by compressive sensing. IEEE Trans Biomed Circuits Syst 11(2):255–266

    Article  Google Scholar 

  66. Xia L, Chen CC, Aggarwal J (2012) View invariant human action recognition using histograms of 3D joints. In: Computer vision and pattern recognition workshops, Providence, Rhode Island, pp 20–27

    Google Scholar 

  67. Xue J, Li J, Gong Y (2013) Restructuring of deep neural network acoustic models with singular value decomposition. In: Annual conference of the international speech communication association, Lyon, France, pp 2365–2369

    Google Scholar 

  68. Yang S, Yuan C, Hu W, Ding X (2014) A hierarchical model based on latent Dirichlet allocation for action recognition. In: International conference on pattern recognition, Stockholm, Sweden, pp 2613–2618

    Google Scholar 

  69. Yang Z, Moczulski M, Denil M, de Freitas N, Smola A, Song L, Wang Z (2015) Deep fried convnets. In: IEEE international conference on computer vision, Santiago, Chile, pp 1476–1483

    Google Scholar 

  70. Yu S et al (2013) 3D vertical RRAM-scaling limit analysis and demonstration of 3D array operation. In: Symposium on VLSI technology and circuits, Kyoto, Japan, pp 158–159

    Google Scholar 

  71. Zhou L, Li W, Zhang Y, Ogunbona P, Nguyen DT, Zhang H (2014) Discriminative key pose extraction using extended LC-KSVD for action recognition. In: International conference on digital image computing: techniques and applications, New South Wales, Australia, pp 1–8

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hantao Huang .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Huang, H., Yu, H. (2019). Tensor-Solver for Deep Neural Network. In: Compact and Fast Machine Learning Accelerator for IoT Devices. Computer Architecture and Design Methodologies. Springer, Singapore. https://doi.org/10.1007/978-981-13-3323-1_4

Download citation

Publish with us

Policies and ethics