Energy Efficient RRAM Crossbar-Based Approximate Computing for Smart Cameras

Wang, Yu; Li, Boxun; Xia, Lixue; Tang, Tianqi; Yang, Huazhong

doi:10.1007/978-3-319-33201-7_5

Energy Efficient RRAM Crossbar-Based Approximate Computing for Smart Cameras

Yu Wang⁵,
Boxun Li⁵,
Lixue Xia⁵,
Tianqi Tang⁵ &
…
Huazhong Yang⁵

Chapter
First Online: 17 October 2016

3241 Accesses
1 Citations

Abstract

Smart cameras have been applied successfully in many fields. The limited battery capacity and power efficiency restrict the local processing capacity of smart cameras. In order to shift vision processing closer to the sensors, we propose a power efficient framework for analog approximate computing with the emerging metal-oxide resistive switching random-access memory (RRAM) devices. A programmable RRAM-based approximate computing unit (RRAM-ACU) is introduced first to accelerate approximated computation, and a scalable approximate computing framework is then proposed on top of the RRAM-ACU. In order to program the RRAM-ACU efficiently, we also present a detailed configuration flow, which includes a customized approximator training scheme, an approximator-parameter-to-RRAM-state mapping algorithm, and an RRAM state tuning scheme. Simulation results on a set of diverse benchmarks demonstrate that, compared with an x86-64 CPU at 2 GHz, the RRAM-ACU is able to achieve 4.06–196.41× speedup and power efficiency of 24.59–567.98 GFLOPS/W with quality loss of 8.72 % on average. The implementation of HMAX application further demonstrates that the proposed RRAM-based approximate computing framework can achieve > 12. 8× power efficiency than the digital implementation counterparts (CPU, GPU, and FPGA).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
A neural network will tend to overfit when many weights of the network are large [28]. Overfitting is a problem that the model learns too much, including the noise, from the training data. The trained model will have poor predictive performance on the unknown testing data which are not covered by the training set.
2.
We use ℓ ₂ regularization in the training scheme. Regularization is a technique widely used in the neural network training to limit the amplitude of network weight, avoid overfitting, and improve model generalization [28]. To be specific, for the ℓ ₂ regularization, a penalty of the square of the 2-norm of network weights will be proportionally added to the loss function of the network. So the error of the network and the amplitude of weights will be balanced and optimized simultaneously in the training process [28].

References

Graf R, Belbachir A, King R, Mayerhofer M (2013) Quality control of real-time panoramic views from the smart camera 360 scan. In: 2013 I.E. international symposium on circuits and systems (ISCAS), pp.650–653
Google Scholar
Esmaeilzadeh H, Sampson A, Ceze L, Burger D (2012) Neural acceleration for general-purpose approximate programs. In: International symposium on microarchitecture(MICRO), pp 449–460
Google Scholar
DARPA (2012) Power efficiency revolution for embedded computing technologies [Online]. Available: https://www.fbo.gov/
NVIDIA Tesla K-Series, DATASHEET (2012) Kepler family product overview [Online]. Available: http://www.nvidia.com/content/tesla/pdf/tesla-kseries-overview-lr.pdf
Intel. (2016) Intel microprocessor export compliance metrics
Google Scholar
Esmaeilzadeh H, Blem E, Aman RS, Sankaralingam K, Burger D (2011) Dark silicon and the end of multicore scaling. In: 2011 38th annual international symposium on computer architecture (ISCA). IEEE, pp 365–376
Google Scholar
Li B, Shan Y, Hu M, Wang Y, Chen Y, Yang H, Memristor-based approximated computation. In: Low power electronics and design (ISLPED), pp 242–247
Google Scholar
Xu C, Dong X, Jouppi NP, Xie Y (2011) Design implications of memristor-based RRAM cross-point structures. In: Design, automation & test in Europe conference & exhibition (DATE). IEEE, pp 1–6
Google Scholar
Jo SH, Chang T, Ebong I, Bhadviya BB, Mazumder P, Lu W (2010) Nanoscale memristor device as synapse in neuromorphic systems. Nano Lett 10(4):1297–1301
Article Google Scholar
Hu M, Li H, Wu Q, Rose GS (2012) Hardware realization of BSB recall function using memristor crossbar arrays. In: Design automation conference, pp 498–503
Google Scholar
Chakradhar S, Raghunathan A (2010) Best-effort computing: re-thinking parallel software and hardware. In: 47th ACM/IEEE design automation conference (DAC), pp 865–870
Google Scholar
Ye R, Wang T, Yuan F, Kumar R, Xu Q (2013) On reconfiguration-oriented approximate adder design and its application. In: Proceedings of the international conference on computer-aided design. IEEE, pp 48–54
Google Scholar
Venkataramani S, Chippa VK, Chakradhar ST, Roy K, Raghunathan A (2013) Quality programmable vector processors for approximate computing. In: Proceedings of the 46th annual IEEE/ACM international symposium on microarchitecture. ACM, pp 1–12
Google Scholar
Wong HSP, Lee H-Y, Yu S, Chen Y-S, Wu Y, Chen P-S, Lee B, Chen F, Tsai M-J (2012) Metal-oxide RRAM. Proc IEEE 100(6):1951–1970
Article Google Scholar
Yu S, Gao B, Fang Z, Yu H, Kang J, Wong H-SP, (2013) A low energy oxide-based electronic synaptic device for neuromorphic visual systems with tolerance to device variation. Adv Mater 25(12):1774–1779
Article Google Scholar
Deng Y, Huang P, Chen B, Yang X, Gao B, Wang J, Zeng L, Du G, Kang J, Liu X (2013) RRAM crossbar array with cell selection device: a device and circuit interaction study. IEEE Trans Electron Devices 60(2):719–726
Article Google Scholar
Alibart F, Gao L, Hoskins BD, Strukov DB (2012) High precision tuning of state for memristive devices by adaptable variation-tolerant algorithm. Nanotechnology 23(7):075201
Article Google Scholar
Guan X, Yu S, Wong H-S (2012) A spice compact model of metal oxide resistive switching memory with variations. IEEE Electron Device Lett 33(10):1405–1407
Article Google Scholar
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Article Google Scholar
Ito Y (1994) Approximation capability of layered neural networks with sigmoid units on two layers. Neural Comput 6(6):1233–1243
Article MATH Google Scholar
Gu P, Li B, Tang T, Yu S, Cao Y, Wang Y, Yang H (2015) Technological exploration of RRAM crossbar array for matrix-vector multiplication. In: The 20th Asia and south pacific design automation conference (ASPDAC). IEEE, pp 106–111
Google Scholar
Cannizzaro SO, Grasso AD, Mita R, Palumbo G, Pennisi S (2007) Design procedures for three-stage CMOS OTAs with nested-Miller compensation. IEEE Trans Circuits Syst I Regul Pap 54(5):933–940
Article Google Scholar
Oh W, Bakkaloglu B (2007) A CMOS low-dropout regulator with current-mode feedback buffer amplifier. IEEE Trans Circuits Syst II Express Briefs 54(10):922–926
Article Google Scholar
Allen PE, Holberg DR (2002) CMOS analog circuit design. Oxford University Press, Oxford
Google Scholar
Li B, Wang Y, Chen Y, Li HH, Yang H (2014) Ice: inline calibration for memristor crossbar-based computing engine. In: Proceedings of the conference on design, automation & test in Europe. European Design and Automation Association, p 184
Google Scholar
Khodabandehloo G, Mirhassani M, Ahmadi M (2012) Analog implementation of a novel resistive-type sigmoidal neuron. IEEE Trans Very Large Scale Integr (VLSI) Syst 20(4):750–754
Article Google Scholar
Fausett L (ed) (1994) Fundamentals of neural networks: architectures, algorithms, and applications. Prentice-Hall, Inc., Upper Saddle River
MATH Google Scholar
Girosi F, Jones M, Poggio T (1995) Regularization theory and neural networks architectures. Neural Comput 7(2):219–269
Article Google Scholar
Bedeschi F, Fackenthal R, Resta C, Donze E, Jagasivamani M, Buda E, Pellizzer F, Chow D, Cabrini A, Calvi G, Faravelli R, Fantini A, Torelli G, Mills D, Gastaldi R, Casagrande G (2009) A bipolar-selected phase change memory featuring multi-level cell storage. IEEE J Solid-State Circuits 44(1):217–227
Article Google Scholar
Lee H, Chen P, Wu T, Chen Y, Wang C, Tzeng P, Lin C, Chen F, Lien C, Tsai M (2008) Low power and high speed bipolar switching with a thin reactive ti buffer layer in robust HFO2 based RRAM. In: IEEE international electron devices meeting (IEDM), pp 1–4
Google Scholar
Kannan S, Rajendran J, Karri R, Sinanoglu O (2013) Sneak-path testing of crossbar-based nonvolatile random access memories. IEEE Trans Nanotechnol 12(3):413–426
Article Google Scholar
ITRS (2013) International technology roadmap for semiconductors
Google Scholar
Gulati K, Lee H-S (1998) A high-swing CMOS telescopic operational amplifier. IEEE J. Solid-State Circuits 33(12):2010–2019
Article Google Scholar
Kull L, Toifl T, Schmatz M, Francese PA, Menolfi C, Braendli M, Kossel M, Morf T, Andersen TM, Leblebici Y (2013) A 3.1 mw 8b 1.2 gs/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32nm digital SOI CMOS. In: 2013 I.E. international solid-state circuits conference digest of technical papers (ISSCC). IEEE, pp 468–469
Google Scholar
Lin W-T, Kuo T-H (2013) A 12b 1.6 gs/s 40 mw dac in 40 nm CMOS with > 70db SFDR over entire Nyquist bandwidth. In: 2013 I.E. international solid-state circuits conference digest of technical papers (ISSCC). IEEE, pp 474–475
Google Scholar
Mutch J, Lowe DG (2008) Object class recognition and localization using sparse features with limited receptive fields. Int J Comput Vision 80(1):45–57
Article Google Scholar
Maashri AA, Debole M, Cotter M, Chandramoorthy N, Xiao Y, Narayanan V, Chakrabarti C (2012) Accelerating neuromorphic vision algorithms for recognition. In: Proceedings of the 49th annual design automation conference, pp 579–584
Google Scholar
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88(2):303–338
Article Google Scholar
Tang T, Xia L, Li B, Luo R, Chen Y, Wang Y, Yang H (2015) Spiking neural network with RRAM: can we use it for real-world application? In: Design, automation test in Europe conference exhibition (DATE), pp 860–865
Google Scholar
Liu C, Yan B, Yang C, Song L, Li Z, Liu B, Chen Y, Li H, Wu Q, Jiang H (2015) A spiking neuromorphic design with resistive crossbar. In: Proceedings of the 52nd annual design automation conference. ACM, p 14
Google Scholar
Seo J-S, Lin B, Kim M, Chen P-Y, Kadetotad D, Xu Z, Mohanty A, Vrudhula S, Yu S, Ye J et al. (2015) On-chip sparse learning acceleration with CMOS and resistive synaptic devices. IEEE Trans Nanotechnol 14(6):969–979
Article Google Scholar
Mazady A, Rahman MT, Forte D, Anwar M (2015) Memristor PUF—a security primitive: theory and experiment. IEEE J Emerging Sel Top Circuits Syst 5(2):222–229
Article Google Scholar
Bojnordi M, Ipek E (2016) Memristive Boltzmann machine: a hardware accelerator for combinatorial optimization and deep learning. In: International symposium on high performance computer architecture (HPCA)
Google Scholar
Li B, Gu P, Shan Y, Wang Y, Chen Y, Yang H, Rram-based analog approximate computing. IEEE Trans Comput Aided Des Integr Circuits Syst 34(12):1905–1917
Google Scholar
Kim K-H, Gaba S, Wheeler D, Cruz-Albrecht JM, Hussain T, Srinivasa N, Lu W (2011) A functional hybrid memristor crossbar-array/CMOS system for data storage and neuromorphic applications. Nano Lett 12(1):389–395
Article Google Scholar
Liu X, Mao M, Liu B, Li H, Chen Y, Li B, Wang Y, Jiang H, Barnell M, Wu Q et al. (2015) Reno: a high-efficient reconfigurable neuromorphic computing accelerator design. In: 2015 52nd ACM/EDAC/IEEE design automation conference (DAC). IEEE, pp 1–6
Google Scholar
Xia L, Li B, Tang T, Gu P, Yin X, Huangfu W, Chen P-Y, Yu S, Cao Y, Wang Y, Xie Y, Yang H (2016) Mnsim: simulation platform for memristor-based neuromorphic computing system. In: Proceedings of the conference on design, automation & test in Europe. European Design and Automation Association
Google Scholar
Li B, Xia L, Gu P, Wang Y, Yang H, Merging the interface: power, area and accuracy co-optimization for rram crossbar-based mixed-signal computing system. In: 2015 52nd ACM/EDAC/IEEE design automation conference (DAC), pp 1–6
Google Scholar
Liu B, Li H, Chen Y, Li X, Wu Q, Huang T (2015) Vortex: variation-aware training for memristor x-bar. In: Proceedings of the 52nd annual design automation conference. ACM, p 15
Google Scholar
Prezioso M, Merrikh-Bayat F, Hoskins B, Adam G, Likharev KK, Strukov DB (2015) Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521(7550):61–64 2015.
Article Google Scholar
Liu B, Li H, Chen Y, Li X, Huang T, Wu Q, Barnell M, Reduction and ir-drop compensations techniques for reliable neuromorphic computing systems. In: 2014 IEEE/ACM international conference on computer-aided design (ICCAD). IEEE, pp 63–70
Google Scholar
Wen W, Wu C-R, Hu X, Liu B, Ho T-Y, Li X, Chen Y (2015) An EDA framework for large scale hybrid neuromorphic computing systems. In: Proceedings of the 52nd annual design automation conference. ACM, p 12
Google Scholar

Download references

Acknowledgements

This work was supported by 973 Project 2013CB329000, National Natural Science Foundation of China (No. 61373026), Brain Inspired Computing Research, Tsinghua University (20141080934), Tsinghua University Initiative Scientific Research Program, the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions.

Author information

Authors and Affiliations

Department of Electronic Engineering, Tsinghua University, Beijing, 100084, China
Yu Wang, Boxun Li, Lixue Xia, Tianqi Tang & Huazhong Yang

Authors

Yu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Boxun Li
View author publications
You can also search for this author in PubMed Google Scholar
Lixue Xia
View author publications
You can also search for this author in PubMed Google Scholar
Tianqi Tang
View author publications
You can also search for this author in PubMed Google Scholar
Huazhong Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Wang .

Editor information

Editors and Affiliations

Center for Integrated Smart Sensors, Yuseong-gu, Daejeon, Korea (Republic of)
Chong-Min Kyung
System LSI Research Center, Kyushu University, Fukuoka, Japan
Hiroto Yasuura
Circuits and Systems Division, Tsinghua University, Beijing, China
Yongpan Liu
National Tsing Hua University , Hsichu, Taiwan
Youn-Long Lin

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Wang, Y., Li, B., Xia, L., Tang, T., Yang, H. (2017). Energy Efficient RRAM Crossbar-Based Approximate Computing for Smart Cameras. In: Kyung, CM., Yasuura, H., Liu, Y., Lin, YL. (eds) Smart Sensors and Systems. Springer, Cham. https://doi.org/10.1007/978-3-319-33201-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-33201-7_5
Published: 17 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33200-0
Online ISBN: 978-3-319-33201-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics