Least-Squares-Solver for Shallow Neural Network

  • Hantao HuangEmail author
  • Hao Yu
Part of the Computer Architecture and Design Methodologies book series (CADM)


This chapter presents a least-square based learning on the single hidden layer neural network. A square-root free Cholesky decomposition technique is applied to reduce the training complexity. Furthermore, the optimized learning algorithm is mapped on CMOS and RRAM based hardware. The two implementations on both RRAM and CMOS are presented. The detailed analysis of hardware implementation is discussed with significant speed-up and energy-efficiency improvement when compared with CPU and GPU based implementations (Figures and illustrations may be reproduced from [11, 12]).


Machine learning Cholesky decomposition Neural network FPGA 


  1. 1.
    (2016) Adm-pcie-7v3., Accessed: 13 June 2016
  2. 2.
  3. 3.
    Akinaga H, Shima H (2010) Resistive random access memory (ReRAM) based on metal oxides. In: Proceedings of the IEEE 98(12):2237–2251.
  4. 4.
    Aljarah I, Faris H, Mirjalili S (2018) Optimizing connection weights in neural networks using the whale optimization algorithm. Soft Comput 22(1):1–15CrossRefGoogle Scholar
  5. 5.
    Chen PY, Kadetotad D, Xu Z, Mohanty A, Lin B, Ye J, Vrudhula S, Seo Js, Cao Y, Yu S (2015) Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip. In: IEEE Proceedings of the 2015 design, automation and test in europe conference and exhibition. EDA Consortium, pp 854–859Google Scholar
  6. 6.
    Chen YC, Wang W, Li H, Zhang W (2012) Non-volatile 3D stacking RRAM-based FPGA. In: IEEE international conference on field programmable logic and applications. Oslo, NorwayGoogle Scholar
  7. 7.
    Decherchi S, Gastaldo P, Leoncini A, Zunino R (2012) Efficient digital implementation of extreme learning machines for classification. IEEE Trans Circuits Syst II: Express Briefs 59(8):496–500CrossRefGoogle Scholar
  8. 8.
    Franzon P, Rotenberg E, Tuck J, Davis WR, Zhou H, Schabel J, Zhang Z, Dwiel JB, Forbes E, Huh J et al (2015) Computing in 3D. In: Custom integrated circuits conference (CICC), 2015 IEEE. IEEE, California, pp 1–6Google Scholar
  9. 9.
    Hecht-Nielsen R (1989) Theory of the backpropagation neural network. In: International joint conference on neural networks, Washington, DC, pp 593–605Google Scholar
  10. 10.
    Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501CrossRefGoogle Scholar
  11. 11.
    Huang H, Yu H (2017) Least-squares-solver based machine learning accelerator for real-time data analytics in smart buildings. In: Emerging technology and architecture for big-data analytics, Springer, pp 51–76.
  12. 12.
    Huang H, Ni L, Wang Y, Yu H, Wangl Z, Cail Y, Huangl R (2016) A 3d multi-layer cmos-rram accelerator for neural network. In: 3D Systems Integration Conference (3DIC), 2016 IEEE International, IEEE, pp 1–5.
  13. 13.
    Igelnik B, Igelnik B, Zurada JM (2013) Efficiency and scalability methods for computational intellect, 1st edn. IGI GlobalGoogle Scholar
  14. 14.
    Khan GM (2018) Evolutionary computation. In: Evolution of artificial neural development, Springer, pp 29–37Google Scholar
  15. 15.
    Kim DH, Athikulwongse K, Lim SK (2013) Study of through-silicon-via impact on the 3-D stacked IC layout. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21(5):862–874CrossRefGoogle Scholar
  16. 16.
    Kim KH, Gaba S, Wheeler D, Cruz-Albrecht JM, Hussain T, Srinivasa N, Lu W (2011) A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications. Nano Lett. 12(1):389–395CrossRefGoogle Scholar
  17. 17.
    Krishnamoorthy A, Menon D (2011) Matrix inversion using Cholesky decomposition. arXiv:11114144
  18. 18.
    Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny imagesGoogle Scholar
  19. 19.
    Lee H, et al (2008) Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM. In: IEEE electron devices meeting,Google Scholar
  20. 20.
    Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014) Scaling distributed machine learning with the parameter server. In: USENIX Symposium on Operating Systems Design and Implementation, vol 14. Broomfield, Colorado, pp 583–598Google Scholar
  21. 21.
    Liauw YY, Zhang Z, Kim W, El Gamal A, Wong SS (2012) Nonvolatile 3D-FPGA with monolithically stacked rram-based configuration memory. In: IEEE international solid-state circuits conference, San Francisco, CaliforniaGoogle Scholar
  22. 22.
    Lichman M (2013) UCI machine learning repository.
  23. 23.
    Martino MD, Fanelli S, Protasi M (1993) A new improved online algorithm for multi-decisional problems based on MLP-networks using a limited amount of information. In: International joint conference on neural networks. Nagoya, Japan, pp 617–620Google Scholar
  24. 24.
    Ni L et al (2016) An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar. In: Asia and South Pacific design automation conference. Macao, China, pp 280–285Google Scholar
  25. 25.
    Pao YH, Park GH, Sobajic DJ (1994) Backpropagation, Part IV Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180.
  26. 26.
    Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, et al (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field- programmable gate arrays, ACM, Monterey, California, pp 26-35Google Scholar
  27. 27.
    Ren F, Marković D, (2016) A configurable 12237 kS/s 12.8 mw sparse-approximation engine for mobile data aggregation of compressively sampled physiological signals. IEEE J Solid-State Circuits 51(1):68–78Google Scholar
  28. 28.
    Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484CrossRefGoogle Scholar
  29. 29.
    Suykens JA, Van Gestel T, De Brabanter J (2002) Least squares support vector machines. World ScientificGoogle Scholar
  30. 30.
    Topaloglu RO (2015) More than Moore technologies for next generation computer design. SpringerGoogle Scholar
  31. 31.
    Trefethen LN, Bau III D (1997) Numerical linear algebra, vol 50. SiamGoogle Scholar
  32. 32.
    Wang Q, Li P, Kim Y (2015a) A parallel digital VLSI architecture for integrated support vector machine training and classification. IEEE Trans. Very Large Scale Integr. Syst. 23(8):1471–1484CrossRefGoogle Scholar
  33. 33.
    Wang Y, Yu H, Ni L, Huang GB, Yan M, Weng C, Yang W, Zhao J (2015b) An energy-efficient nonvolatile in-memory computing architecture for extreme learning machine by domain-wall nanowire devices. IEEE Trans. Nanotechnol. 14(6):998–1012CrossRefGoogle Scholar
  34. 34.
    Xia L, Gu P, Li B, Tang T, Yin X, Huangfu W, Yu S, Cao Y, Wang Y, Yang H (2016) Technological exploration of RRAM crossbar array for matrix-vector multiplication. J. Comput. Sci. Technol. 31(1):3–19MathSciNetCrossRefGoogle Scholar
  35. 35.
    Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: International symposium on field-programmable gate arrays. Monterey, California, pp 161–170Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.School of Electrical and Electronic EngineeringNanyang Technological UniversitySingaporeSingapore
  2. 2.Department of Electrical and Electronic EngineeringSouthern University of Science and TechnologyShenzhenChina

Personalised recommendations