Skip to main content

Least-Squares-Solver for Shallow Neural Network

  • Chapter
  • First Online:
  • 833 Accesses

Part of the book series: Computer Architecture and Design Methodologies ((CADM))

Abstract

This chapter presents a least-square based learning on the single hidden layer neural network. A square-root free Cholesky decomposition technique is applied to reduce the training complexity. Furthermore, the optimized learning algorithm is mapped on CMOS and RRAM based hardware. The two implementations on both RRAM and CMOS are presented. The detailed analysis of hardware implementation is discussed with significant speed-up and energy-efficiency improvement when compared with CPU and GPU based implementations (Figures and illustrations may be reproduced from [11, 12]).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    PCIe is short for Peripheral Component Interconnect Express.

References

  1. (2016) Adm-pcie-7v3. http://www.alpha-data.com/dcp/products.php?product=adm-pcie-7v3, Accessed: 13 June 2016

  2. (2016) Beagleboard-xm. http://beagleboard.org/beagleboard-xm

  3. Akinaga H, Shima H (2010) Resistive random access memory (ReRAM) based on metal oxides. In: Proceedings of the IEEE 98(12):2237–2251. https://doi.org/10.1109/JPROC.2010.2070830

  4. Aljarah I, Faris H, Mirjalili S (2018) Optimizing connection weights in neural networks using the whale optimization algorithm. Soft Comput 22(1):1–15

    Article  Google Scholar 

  5. Chen PY, Kadetotad D, Xu Z, Mohanty A, Lin B, Ye J, Vrudhula S, Seo Js, Cao Y, Yu S (2015) Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip. In: IEEE Proceedings of the 2015 design, automation and test in europe conference and exhibition. EDA Consortium, pp 854–859

    Google Scholar 

  6. Chen YC, Wang W, Li H, Zhang W (2012) Non-volatile 3D stacking RRAM-based FPGA. In: IEEE international conference on field programmable logic and applications. Oslo, Norway

    Google Scholar 

  7. Decherchi S, Gastaldo P, Leoncini A, Zunino R (2012) Efficient digital implementation of extreme learning machines for classification. IEEE Trans Circuits Syst II: Express Briefs 59(8):496–500

    Article  Google Scholar 

  8. Franzon P, Rotenberg E, Tuck J, Davis WR, Zhou H, Schabel J, Zhang Z, Dwiel JB, Forbes E, Huh J et al (2015) Computing in 3D. In: Custom integrated circuits conference (CICC), 2015 IEEE. IEEE, California, pp 1–6

    Google Scholar 

  9. Hecht-Nielsen R (1989) Theory of the backpropagation neural network. In: International joint conference on neural networks, Washington, DC, pp 593–605

    Google Scholar 

  10. Huang GB, Zhu QY, Siew CK (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501

    Article  Google Scholar 

  11. Huang H, Yu H (2017) Least-squares-solver based machine learning accelerator for real-time data analytics in smart buildings. In: Emerging technology and architecture for big-data analytics, Springer, pp 51–76. https://doi.org/10.1007/978-3-319-54840-1_3

  12. Huang H, Ni L, Wang Y, Yu H, Wangl Z, Cail Y, Huangl R (2016) A 3d multi-layer cmos-rram accelerator for neural network. In: 3D Systems Integration Conference (3DIC), 2016 IEEE International, IEEE, pp 1–5. https://doi.org/10.1109/3DIC.2016.7970014

  13. Igelnik B, Igelnik B, Zurada JM (2013) Efficiency and scalability methods for computational intellect, 1st edn. IGI Global

    Google Scholar 

  14. Khan GM (2018) Evolutionary computation. In: Evolution of artificial neural development, Springer, pp 29–37

    Google Scholar 

  15. Kim DH, Athikulwongse K, Lim SK (2013) Study of through-silicon-via impact on the 3-D stacked IC layout. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 21(5):862–874

    Article  Google Scholar 

  16. Kim KH, Gaba S, Wheeler D, Cruz-Albrecht JM, Hussain T, Srinivasa N, Lu W (2011) A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications. Nano Lett. 12(1):389–395

    Article  Google Scholar 

  17. Krishnamoorthy A, Menon D (2011) Matrix inversion using Cholesky decomposition. arXiv:11114144

  18. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images

    Google Scholar 

  19. Lee H, et al (2008) Low power and high speed bipolar switching with a thin reactive Ti buffer layer in robust HfO2 based RRAM. In: IEEE electron devices meeting,

    Google Scholar 

  20. Li M, Andersen DG, Park JW, Smola AJ, Ahmed A, Josifovski V, Long J, Shekita EJ, Su BY (2014) Scaling distributed machine learning with the parameter server. In: USENIX Symposium on Operating Systems Design and Implementation, vol 14. Broomfield, Colorado, pp 583–598

    Google Scholar 

  21. Liauw YY, Zhang Z, Kim W, El Gamal A, Wong SS (2012) Nonvolatile 3D-FPGA with monolithically stacked rram-based configuration memory. In: IEEE international solid-state circuits conference, San Francisco, California

    Google Scholar 

  22. Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml

  23. Martino MD, Fanelli S, Protasi M (1993) A new improved online algorithm for multi-decisional problems based on MLP-networks using a limited amount of information. In: International joint conference on neural networks. Nagoya, Japan, pp 617–620

    Google Scholar 

  24. Ni L et al (2016) An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar. In: Asia and South Pacific design automation conference. Macao, China, pp 280–285

    Google Scholar 

  25. Pao YH, Park GH, Sobajic DJ (1994) Backpropagation, Part IV Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180. https://doi.org/10.1016/0925-2312(94)90053-1

  26. Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S, et al (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field- programmable gate arrays, ACM, Monterey, California, pp 26-35

    Google Scholar 

  27. Ren F, Marković D, (2016) A configurable 12237 kS/s 12.8 mw sparse-approximation engine for mobile data aggregation of compressively sampled physiological signals. IEEE J Solid-State Circuits 51(1):68–78

    Google Scholar 

  28. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484

    Article  Google Scholar 

  29. Suykens JA, Van Gestel T, De Brabanter J (2002) Least squares support vector machines. World Scientific

    Google Scholar 

  30. Topaloglu RO (2015) More than Moore technologies for next generation computer design. Springer

    Google Scholar 

  31. Trefethen LN, Bau III D (1997) Numerical linear algebra, vol 50. Siam

    Google Scholar 

  32. Wang Q, Li P, Kim Y (2015a) A parallel digital VLSI architecture for integrated support vector machine training and classification. IEEE Trans. Very Large Scale Integr. Syst. 23(8):1471–1484

    Article  Google Scholar 

  33. Wang Y, Yu H, Ni L, Huang GB, Yan M, Weng C, Yang W, Zhao J (2015b) An energy-efficient nonvolatile in-memory computing architecture for extreme learning machine by domain-wall nanowire devices. IEEE Trans. Nanotechnol. 14(6):998–1012

    Article  Google Scholar 

  34. Xia L, Gu P, Li B, Tang T, Yin X, Huangfu W, Yu S, Cao Y, Wang Y, Yang H (2016) Technological exploration of RRAM crossbar array for matrix-vector multiplication. J. Comput. Sci. Technol. 31(1):3–19

    Article  MathSciNet  Google Scholar 

  35. Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: International symposium on field-programmable gate arrays. Monterey, California, pp 161–170

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hantao Huang .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Huang, H., Yu, H. (2019). Least-Squares-Solver for Shallow Neural Network. In: Compact and Fast Machine Learning Accelerator for IoT Devices. Computer Architecture and Design Methodologies. Springer, Singapore. https://doi.org/10.1007/978-981-13-3323-1_3

Download citation

Publish with us

Policies and ethics