Skip to main content

Advertisement

Log in

A Rank Decomposed Statistical Error Compensation Technique for Robust Convolutional Neural Networks in the Near Threshold Voltage Regime

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

There has been a growing interest in implementing complex machine learning algorithms such as convolutional neural networks (CNNs) on lower power embedded platforms to enable on-device learning and inference. Many of these platforms are to be deployed as low power sensor nodes with low to medium throughput requirement. Near threshold voltage (NTV) designs are well-suited for these applications but suffer from a significant increase in variations. In this paper, we propose a variation-tolerant architecture for CNNs capable of operating in NTV regime for energy efficiency. A statistical error compensation (SEC) technique referred to as rank decomposed SEC (RD-SEC) is proposed. The key idea is to exploit inherent redundancy within matrix-vector multiplication (or dot product ensemble), a power-hungry operation in CNNs, to derive low-cost estimators for error detection and compensation. When evaluated in CNNs for both the MNIST and CIFAR-10 datasets, simulation results in 45 nm CMOS show that RD-SEC enables robust CNNs operating in the NTV regime. Specifically, the proposed architecture can achieve up to 11 × improvement in variation tolerance and enable up to 113 × reduction in the standard deviation of detection accuracy Pdet while incurring marginal degradation in the median detection accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11

Similar content being viewed by others

References

  1. Chen, Y.H., Krishna, T., Emer, J., Sze, V. (2016). Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. In IEEE international solid-state circuits conference (ISSCC).

  2. Chung, J.G., & Parhi, K.K. (2002). Frequency spectrum based low-area low-power parallel FIR filter design. EURASIP Journal on Applied Signal Processing, 2002, 944–953.

    MATH  Google Scholar 

  3. Mahesh, R., & Vinod, A. (2010). New reconfigurable architectures for implementing FIR filters with low complexity. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 29(2), 275–288.

  4. Liu, X., Zhou, J., Liao, X., Wang, C., Luo, J., Madihian, M., Je, M. (2012). Ultra-low-energy near-threshold biomedical signal processor for versatile wireless health monitoring. In 2012 IEEE Asian solid state circuits conference (a-SSCC) (pp. 381–384). https://doi.org/10.1109/ASSCC.2012.6570806.

  5. Kim, Y., Hong, I., Yoo, H.J. (2015). 18.3 A 0.5v 54 uw ultra-low-power recognition processor with 93.5 compression. In 2015 IEEE international solid-state circuits conference - (ISSCC) digest of technical papers (pp. 1–3).

  6. Dreslinski, R., Wieckowski, M., Blaauw, D., Sylvester, D., Mudge, T. (2010). Near-threshold computing: reclaiming Moore’s law through energy efficient integrated circuits. Proceedings of the IEEE, 98(2), 253–266.

  7. Das, S., Blaauw, D., Bull, D., Flautner, K., Aitken, R. (2009). Addressing design margins through error-tolerant circuits. In 46th ACM/IEEE design automation conference (DAC) (pp. 11–12).

  8. Tschanz, J., Bowman, K., Wilkerson, C., Lu, S.L., Karnik, T. (2009). Resilient circuits: enabling energy-efficient performance and reliability. In IEEE/ACM international conference on computer-aided design (ICCAD).

  9. Bahar, R., Mundy, J., Chen, J. (2003). A probabilistic-based design methodology for nanoscale computation. In IEEE/ACM international conference on computer aided design (ICCAD) (pp. 480–486).

  10. Vaidya, N., & Pradhan, D. (1993). Fault-tolerant design strategies for high reliability and safety. IEEE Transactions on Computers, 42(10), 1195–1206.

    Article  Google Scholar 

  11. Shim, B., Sridhara, S., Shanbhag, N. (2004). Reliable low-power digital signal processing via reduced precision redundancy. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 12(5), 497–510.

    Article  Google Scholar 

  12. Choi, J., Kim, E.P., Rutenbar, R.A., Shanbhag, N.R. (2013). Error resilient MRF message passing architecture for stereo matching. In IEEE workshop on signal processing systems (siPS) (pp. 348–353).

  13. Abdallah, R.A., & Shanbhag, N. R. (2013). Error-resilient systems via statistical signal processing. In IEEE workshop on signal processing systems (siPS).

  14. Abdallah, R.A., & Shanbhag, N.R. (2013). An energy-efficient ecg processor in 45-nm cmos using statistical error compensation. IEEE Journal of Solid-State Circuits, 48(11), 2882–2893.

    Article  Google Scholar 

  15. Lin, Y., Zhang, S., Shanbhag, N.R. (2016). Variation-tolerant architectures for convolutional neural networks in the near threshold voltage regime. In 2016 IEEE international workshop on signal processing systems (SiPS). https://doi.org/10.1109/SiPS.2016.11 (pp. 17–22).

  16. Zhang, S., & Shanbhag, N. (2016). Probabilistic error models for machine learning kernels implemented on stochastic nanoscale fabrics. In Design, automation test in Europe (DATE).

  17. Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y. (2009). What is the best multi-stage architecture for object recognition?. In IEEE 12th international conference on computer vision (pp. 2146–2153).

  18. Lecun, Y., Bottou, L., Bengio, Y., Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

  19. Han, S., Pool, J., Tran, J., Dally, W. (2015). Learning both weights and connections for efficient neural network. In Advances in neural information processing systems (pp. 1135–1143).

  20. Wang, Y. et al. (2016). Low power convolutional neural networks on a chip. In Proceedings of the IEEE international symposium on circuits and systems (ISCAS) (pp. 129–132).

  21. Courbariaux, M. et al. (2016). Binarynet: training deep neural networks with weights and activations constrained to + 1 or − 1. arXiv:1602.02830.

  22. Hwang, K., & Sung, W. (2014). Fixed-point feedforward deep neural network design using weights + 1, 0, and − 1. In IEEE workshop on signal processing systems (siPS), 2014 (pp. 1–6): IEEE.

  23. Anwar, S., Hwang, K., Sung, W. (2015). Fixed point optimization of deep convolutional neural networks for object recognition. In IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015 (pp. 1131–1135): IEEE.

  24. Sung, W., Shin, S., Hwang, K. (2015). Resiliency of deep neural networks under quantization. arXiv:1511.06488.

  25. Knag, P., Liu, C., Zhang, Z. (2016). A 1.40mm2 141mw 898gops sparse neuromorphic processor in 40nm cmos. In 2016 IEEE symposium on VLSI circuits (VLSI-circuits) (pp. 1–2).

  26. Lin, Y., Sakr, C., Kim, Y., Shanbhag, N.R. (2017). Predictivenet: an energy-efficient convolutional neural network via zero prediction. In 2017 IEEE international symposium on circuits and systems (ISCAS).

  27. Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y., Temam, O. (2014). DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning. In ACM sigplan notices, (Vol. 49 pp. 269–284): ACM.

  28. Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X., Chen, Y., Temam, O. (2015). Shidiannao: shifting vision processing closer to the sensor. In ACM/IEEE 42nd annual international symposium on computer architecture (ISCA), 2015 (pp. 92–104).

  29. Kang, M., Gonugondla, S.K., Keel, M.S., Shanbhag, N.R. (2015). An energy-efficient memory-based high-throughput vlsi architecture for convolutional networks. In IEEE international conference on acoustics, speech and signal processing (ICASSP), 2015 (pp. 1037–1041): IEEE.

  30. Teodorescu, R., Nakano, J., Tiwari, A., Torrellas, J. (2007). Mitigating parameter variation with dynamic fine-grain body biasing. In 40th annual IEEE/ACM international symposium on microarchitecture (MICRO).

  31. Liang, X., Wei, G.Y., Brooks, D. (2009). Revival: a variation-tolerant architecture using voltage interpolation and variable latency. IEEE Micro, 29, 127–138.

  32. Strang, G. (2003). Introduction to linear algebra, 3rd edn. Wesley-Cambridge Press.

  33. Bertsekas, D.P., & Tsitsiklis, J.N. (2008). Introduction to probability, 2nd edn. Belmont: Athena Scientific.

    Google Scholar 

  34. Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.222.9220

  35. Rabaey, J.M., Chandrakasan, A., Nikolic, B. (2003). Digital integrated circuits: a design perspective. Upper Saddle River: Prentice-Hall, Inc.

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by Systems on Nanoscale Information fabriCs (SONIC), one of the six SRC STARnet Centers, sponsored by MARCO and DARPA.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yingyan Lin.

Appendix

Appendix

In this Appendix, we provide a detailed expression for α in Eq. 15. The complexity is calculated in terms of the number of FAs. From Eq. 15, α is given by:

$$\begin{array}{@{}rcl@{}} \alpha=\frac{N_{E}}{N_{M}}=\frac{N_{add-R}+N_{MUX}}{N_{DP}} \end{array} $$
(17)

where NE and NM denote the complexities of one E-block and M-block, respectively, NaddR denotes the complexity of the summer in Eq. 13, NMUX denotes the complexity of MUX-based shifter in Eq. 13, NDP denotes the complexity of one DP implemented using a Baugh-Wooley (BW) multiplier and ripple-carry adder (RCA). Specifically,

$$\begin{array}{@{}rcl@{}} N_{add-R}=(R-1)(B_{out}+\left\lceil \log_{2}(R)\right\rceil -1) \end{array} $$
(18)
$$\begin{array}{@{}rcl@{}} N_{MUX}=B_{out}(\left\lceil \log_{2}(B_{out}+ 1)\right\rceil r_{_{M2F}})R \end{array} $$
(19)
$$ N_{DP}=NB_{w}B_{in}+(N-1)(B_{in}+B_{w}+\left\lceil \log_{2}(N)\right\rceil -1) $$
(20)

where \(r_{_{M2F}}\) denotes the normalized complexity of a 2 : 1 MUX over a FA and we use \(r_{_{M2F}}= 3.5/9\) [35], the ⌈a⌉ is the ceiling operation, and Bin, Bout and Bw denote the precision for the input/output and weights, respectively.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lin, Y., Zhang, S. & Shanbhag, N.R. A Rank Decomposed Statistical Error Compensation Technique for Robust Convolutional Neural Networks in the Near Threshold Voltage Regime. J Sign Process Syst 90, 1439–1451 (2018). https://doi.org/10.1007/s11265-018-1332-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-018-1332-4

Keywords

Navigation