Abstract
Smart cameras have been applied successfully in many fields. The limited battery capacity and power efficiency restrict the local processing capacity of smart cameras. In order to shift vision processing closer to the sensors, we propose a power efficient framework for analog approximate computing with the emerging metal-oxide resistive switching random-access memory (RRAM) devices. A programmable RRAM-based approximate computing unit (RRAM-ACU) is introduced first to accelerate approximated computation, and a scalable approximate computing framework is then proposed on top of the RRAM-ACU. In order to program the RRAM-ACU efficiently, we also present a detailed configuration flow, which includes a customized approximator training scheme, an approximator-parameter-to-RRAM-state mapping algorithm, and an RRAM state tuning scheme. Simulation results on a set of diverse benchmarks demonstrate that, compared with an x86-64 CPU at 2 GHz, the RRAM-ACU is able to achieve 4.06–196.41× speedup and power efficiency of 24.59–567.98 GFLOPS/W with quality loss of 8.72 % on average. The implementation of HMAX application further demonstrates that the proposed RRAM-based approximate computing framework can achieve > 12. 8× power efficiency than the digital implementation counterparts (CPU, GPU, and FPGA).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
A neural network will tend to overfit when many weights of the network are large [28]. Overfitting is a problem that the model learns too much, including the noise, from the training data. The trained model will have poor predictive performance on the unknown testing data which are not covered by the training set.
- 2.
We use ℓ 2 regularization in the training scheme. Regularization is a technique widely used in the neural network training to limit the amplitude of network weight, avoid overfitting, and improve model generalization [28]. To be specific, for the ℓ 2 regularization, a penalty of the square of the 2-norm of network weights will be proportionally added to the loss function of the network. So the error of the network and the amplitude of weights will be balanced and optimized simultaneously in the training process [28].
References
Graf R, Belbachir A, King R, Mayerhofer M (2013) Quality control of real-time panoramic views from the smart camera 360 scan. In: 2013 I.E. international symposium on circuits and systems (ISCAS), pp.650–653
Esmaeilzadeh H, Sampson A, Ceze L, Burger D (2012) Neural acceleration for general-purpose approximate programs. In: International symposium on microarchitecture(MICRO), pp 449–460
DARPA (2012) Power efficiency revolution for embedded computing technologies [Online]. Available: https://www.fbo.gov/
NVIDIA Tesla K-Series, DATASHEET (2012) Kepler family product overview [Online]. Available: http://www.nvidia.com/content/tesla/pdf/tesla-kseries-overview-lr.pdf
Intel. (2016) Intel microprocessor export compliance metrics
Esmaeilzadeh H, Blem E, Aman RS, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: 2011 38th annual international symposium on computer architecture (ISCA). IEEE, pp 365–376
Li B, Shan Y, Hu M, Wang Y, Chen Y, Yang H, Memristor-based approximated computation. In: Low power electronics and design (ISLPED), pp 242–247
Xu C, Dong X, Jouppi NP, Xie Y (2011) Design implications of memristor-based RRAM cross-point structures. In: Design, automation & test in Europe conference & exhibition (DATE). IEEE, pp 1–6
Jo SH, Chang T, Ebong I, Bhadviya BB, Mazumder P, Lu W (2010) Nanoscale memristor device as synapse in neuromorphic systems. Nano Lett 10(4):1297–1301
Hu M, Li H, Wu Q, Rose GS (2012) Hardware realization of BSB recall function using memristor crossbar arrays. In: Design automation conference, pp 498–503
Chakradhar S, Raghunathan A (2010) Best-effort computing: re-thinking parallel software and hardware. In: 47th ACM/IEEE design automation conference (DAC), pp 865–870
Ye R, Wang T, Yuan F, Kumar R, Xu Q (2013) On reconfiguration-oriented approximate adder design and its application. In: Proceedings of the international conference on computer-aided design. IEEE, pp 48–54
Venkataramani S, Chippa VK, Chakradhar ST, Roy K, Raghunathan A (2013) Quality programmable vector processors for approximate computing. In: Proceedings of the 46th annual IEEE/ACM international symposium on microarchitecture. ACM, pp 1–12
Wong HSP, Lee H-Y, Yu S, Chen Y-S, Wu Y, Chen P-S, Lee B, Chen F, Tsai M-J (2012) Metal-oxide RRAM. Proc IEEE 100(6):1951–1970
Yu S, Gao B, Fang Z, Yu H, Kang J, Wong H-SP, (2013) A low energy oxide-based electronic synaptic device for neuromorphic visual systems with tolerance to device variation. Adv Mater 25(12):1774–1779
Deng Y, Huang P, Chen B, Yang X, Gao B, Wang J, Zeng L, Du G, Kang J, Liu X (2013) RRAM crossbar array with cell selection device: a device and circuit interaction study. IEEE Trans Electron Devices 60(2):719–726
Alibart F, Gao L, Hoskins BD, Strukov DB (2012) High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm. Nanotechnology 23(7):075201
Guan X, Yu S, Wong H-S (2012) A spice compact model of metal oxide resistive switching memory with variations. IEEE Electron Device Lett 33(10):1405–1407
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Ito Y (1994) Approximation capability of layered neural networks with sigmoid units on two layers. Neural Comput 6(6):1233–1243
Gu P, Li B, Tang T, Yu S, Cao Y, Wang Y, Yang H (2015) Technological exploration of RRAM crossbar array for matrix-vector multiplication. In: The 20th Asia and south pacific design automation conference (ASPDAC). IEEE, pp 106–111
Cannizzaro SO, Grasso AD, Mita R, Palumbo G, Pennisi S (2007) Design procedures for three-stage CMOS OTAs with nested-Miller compensation. IEEE Trans Circuits Syst I Regul Pap 54(5):933–940
Oh W, Bakkaloglu B (2007) A CMOS low-dropout regulator with current-mode feedback buffer amplifier. IEEE Trans Circuits Syst II Express Briefs 54(10):922–926
Allen PE, Holberg DR (2002) CMOS analog circuit design. Oxford University Press, Oxford
Li B, Wang Y, Chen Y, Li HH, Yang H (2014) Ice: inline calibration for memristor crossbar-based computing engine. In: Proceedings of the conference on design, automation & test in Europe. European Design and Automation Association, p 184
Khodabandehloo G, Mirhassani M, Ahmadi M (2012) Analog implementation of a novel resistive-type sigmoidal neuron. IEEE Trans Very Large Scale Integr (VLSI) Syst 20(4):750–754
Fausett L (ed) (1994) Fundamentals of neural networks: architectures, algorithms, and applications. Prentice-Hall, Inc., Upper Saddle River
Girosi F, Jones M, Poggio T (1995) Regularization theory and neural networks architectures. Neural Comput 7(2):219–269
Bedeschi F, Fackenthal R, Resta C, Donze E, Jagasivamani M, Buda E, Pellizzer F, Chow D, Cabrini A, Calvi G, Faravelli R, Fantini A, Torelli G, Mills D, Gastaldi R, Casagrande G (2009) A bipolar-selected phase change memory featuring multi-level cell storage. IEEE J Solid-State Circuits 44(1):217–227
Lee H, Chen P, Wu T, Chen Y, Wang C, Tzeng P, Lin C, Chen F, Lien C, Tsai M (2008) Low power and high speed bipolar switching with a thin reactive ti buffer layer in robust HFO2 based RRAM. In: IEEE international electron devices meeting (IEDM), pp 1–4
Kannan S, Rajendran J, Karri R, Sinanoglu O (2013) Sneak-path testing of crossbar-based nonvolatile random access memories. IEEE Trans Nanotechnol 12(3):413–426
ITRS (2013) International technology roadmap for semiconductors
Gulati K, Lee H-S (1998) A high-swing CMOS telescopic operational amplifier. IEEE J. Solid-State Circuits 33(12):2010–2019
Kull L, Toifl T, Schmatz M, Francese PA, Menolfi C, Braendli M, Kossel M, Morf T, Andersen TM, Leblebici Y (2013) A 3.1 mw 8b 1.2 gs/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32nm digital SOI CMOS. In: 2013 I.E. international solid-state circuits conference digest of technical papers (ISSCC). IEEE, pp 468–469
Lin W-T, Kuo T-H (2013) A 12b 1.6 gs/s 40 mw dac in 40 nm CMOS with > 70db SFDR over entire Nyquist bandwidth. In: 2013 I.E. international solid-state circuits conference digest of technical papers (ISSCC). IEEE, pp 474–475
Mutch J, Lowe DG (2008) Object class recognition and localization using sparse features with limited receptive fields. Int J Comput Vision 80(1):45–57
Maashri AA, Debole M, Cotter M, Chandramoorthy N, Xiao Y, Narayanan V, Chakrabarti C (2012) Accelerating neuromorphic vision algorithms for recognition. In: Proceedings of the 49th annual design automation conference, pp 579–584
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2):303–338
Tang T, Xia L, Li B, Luo R, Chen Y, Wang Y, Yang H (2015) Spiking neural network with RRAM: can we use it for real-world application? In: Design, automation test in Europe conference exhibition (DATE), pp 860–865
Liu C, Yan B, Yang C, Song L, Li Z, Liu B, Chen Y, Li H, Wu Q, Jiang H (2015) A spiking neuromorphic design with resistive crossbar. In: Proceedings of the 52nd annual design automation conference. ACM, p 14
Seo J-S, Lin B, Kim M, Chen P-Y, Kadetotad D, Xu Z, Mohanty A, Vrudhula S, Yu S, Ye J et al. (2015) On-chip sparse learning acceleration with CMOS and resistive synaptic devices. IEEE Trans Nanotechnol 14(6):969–979
Mazady A, Rahman MT, Forte D, Anwar M (2015) Memristor PUF—a security primitive: theory and experiment. IEEE J Emerging Sel Top Circuits Syst 5(2):222–229
Bojnordi M, Ipek E (2016) Memristive Boltzmann machine: a hardware accelerator for combinatorial optimization and deep learning. In: International symposium on high performance computer architecture (HPCA)
Li B, Gu P, Shan Y, Wang Y, Chen Y, Yang H, Rram-based analog approximate computing. IEEE Trans Comput Aided Des Integr Circuits Syst 34(12):1905–1917
Kim K-H, Gaba S, Wheeler D, Cruz-Albrecht JM, Hussain T, Srinivasa N, Lu W (2011) A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications. Nano Lett 12(1):389–395
Liu X, Mao M, Liu B, Li H, Chen Y, Li B, Wang Y, Jiang H, Barnell M, Wu Q et al. (2015) Reno: a high-efficient reconfigurable neuromorphic computing accelerator design. In: 2015 52nd ACM/EDAC/IEEE design automation conference (DAC). IEEE, pp 1–6
Xia L, Li B, Tang T, Gu P, Yin X, Huangfu W, Chen P-Y, Yu S, Cao Y, Wang Y, Xie Y, Yang H (2016) Mnsim: simulation platform for memristor-based neuromorphic computing system. In: Proceedings of the conference on design, automation & test in Europe. European Design and Automation Association
Li B, Xia L, Gu P, Wang Y, Yang H, Merging the interface: power, area and accuracy co-optimization for rram crossbar-based mixed-signal computing system. In: 2015 52nd ACM/EDAC/IEEE design automation conference (DAC), pp 1–6
Liu B, Li H, Chen Y, Li X, Wu Q, Huang T (2015) Vortex: variation-aware training for memristor x-bar. In: Proceedings of the 52nd annual design automation conference. ACM, p 15
Prezioso M, Merrikh-Bayat F, Hoskins B, Adam G, Likharev KK, Strukov DB (2015) Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521(7550):61–64 2015.
Liu B, Li H, Chen Y, Li X, Huang T, Wu Q, Barnell M, Reduction and ir-drop compensations techniques for reliable neuromorphic computing systems. In: 2014 IEEE/ACM international conference on computer-aided design (ICCAD). IEEE, pp 63–70
Wen W, Wu C-R, Hu X, Liu B, Ho T-Y, Li X, Chen Y (2015) An EDA framework for large scale hybrid neuromorphic computing systems. In: Proceedings of the 52nd annual design automation conference. ACM, p 12
Acknowledgements
This work was supported by 973 Project 2013CB329000, National Natural Science Foundation of China (No. 61373026), Brain Inspired Computing Research, Tsinghua University (20141080934), Tsinghua University Initiative Scientific Research Program, the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Wang, Y., Li, B., Xia, L., Tang, T., Yang, H. (2017). Energy Efficient RRAM Crossbar-Based Approximate Computing for Smart Cameras. In: Kyung, CM., Yasuura, H., Liu, Y., Lin, YL. (eds) Smart Sensors and Systems. Springer, Cham. https://doi.org/10.1007/978-3-319-33201-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-33201-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33200-0
Online ISBN: 978-3-319-33201-7
eBook Packages: EngineeringEngineering (R0)