Hardware–Software Approximations for Deep Neural Networks

  • Muhammad Abdullah HanifEmail author
  • Muhammad Usama Javed
  • Rehan Hafiz
  • Semeen Rehman
  • Muhammad Shafique


Neural networks (NNs) are the state of the art for many artificial intelligence (AI) applications. However, in order to facilitate the training process, most of the neural networks are over-parameterized and result in significant computational and memory overheads. Therefore, to alleviate the computational and memory requirements of these NNs, numerous optimization techniques have been proposed. In this chapter, we highlight one of the prominent paradigms, i.e., approximate computing, that can significantly improve the resource requirements of these networks. We describe a sensitivity analysis methodology for estimating the significance sub-parts of the state-of-the-art NNs. Based upon the significance analysis, we then present a methodology for employing tolerable amount of approximations at various stages of the network, i.e., removal of ineffectual filters/neurons at the software layer and precision reduction and memory approximations at the hardware layer. Towards the end of this chapter, we also highlight few of the prominent challenges in adopting different types of approximation and the effects that they have on the overall efficiency and accuracy of the baseline networks.


  1. 1.
    Almurib HAF, Kumar TN, Lombardi F (2016) Inexact designs for approximate low power addition by cell replacement. In: 2016 design, automation test in Europe conference exhibition (DATE), pp 660–665Google Scholar
  2. 2.
    Alorda B, Torrens G, Bota S, Segura J (2011) 8t vs. 6t sram cell radiation robustness: a comparative analysis. Microelectron Reliab 51(2):350–359CrossRefGoogle Scholar
  3. 3.
    Anwar S, Hwang K, Sung W (2017) Structured pruning of deep convolutional neural networks. ACM J Emerg Technol Comput Syst 13(3):32CrossRefGoogle Scholar
  4. 4.
    Chen W, Wilson J, Tyree S, Weinberger K, Chen Y (2015) Compressing neural networks with the hashing trick. In: International conference on machine learning, pp 2285–2294Google Scholar
  5. 5.
    Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition. CVPR 2009. IEEE, Piscataway, pp 248–255Google Scholar
  6. 6.
    Dreslinski RG, Wieckowski M, Blaauw D, Sylvester D, Mudge T (2010) Near-threshold computing: reclaiming Moore’s law through energy efficient integrated circuits. Proc IEEE 98(2):253–266CrossRefGoogle Scholar
  7. 7.
    Gupta V, Mohapatra D, Park SP, Raghunathan A, Roy K (2011) Impact: imprecise adders for low-power approximate computing. In: Proceedings of the 17th IEEE/ACM international symposium on low-power electronics and design. IEEE Press, New York, pp 409–414CrossRefGoogle Scholar
  8. 8.
    Gupta S, Agrawal A, Gopalakrishnan K, Narayanan P (2015) Deep learning with limited numerical precision. In: Proceedings of the 32nd international conference on machine learning (ICML-15), pp 1737–1746Google Scholar
  9. 9.
    Han S, Mao H, Dally WJ (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv:1510.00149Google Scholar
  10. 10.
    Hanif MA, Hafiz R, Hasan O, Shafique M (2017) Quad: design and analysis of quality-area optimal low-latency approximate adders. In: 2017 54th ACM/EDAC/IEEE design automation conference (DAC), pp 1–6Google Scholar
  11. 11.
    Hanif MA, Hafiz R, Shafique M (2018) Error resilience analysis for systematically employing approximate computing in convolutional neural networks. In: 2018 design, automation & test in Europe conference & exhibition (DATE), Dresden, pp. 913–916Google Scholar
  12. 12.
    Hashemi S, Anthony N, Tann H, Bahar RI, Reda S (2017) Understanding the impact of precision quantization on the accuracy and energy of neural networks. In: 2017 design, automation & test in europe conference & exhibition (DATE). IEEE, Piscataway, pp 1474–1479CrossRefGoogle Scholar
  13. 13.
    He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778Google Scholar
  14. 14.
    Hegde R, Shanbhag NR (2001) Soft digital signal processing. IEEE Trans Very Large Scale Integr Syst 9(6):813–823CrossRefGoogle Scholar
  15. 15.
    Inacio C, Ombres D (1996) The DSP decision: fixed point or floating? IEEE Spectr 33(9):72–74CrossRefGoogle Scholar
  16. 16.
    Jiang H, Han J, Qiao F, Lombardi F (2016) Approximate radix-8 booth multipliers for low-power and high-performance operation. IEEE Trans Comput 65:2638–2644CrossRefMathSciNetGoogle Scholar
  17. 17.
    Jiang H, Liu C, Maheshwari N, Lombardi F, Han J (2016) A comparative evaluation of approximate multipliers. In: IEEE/ACM international symposium on nanoscale architectures (NANOARCH). IEEE, Piscataway, pp 191–196Google Scholar
  18. 18.
    Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A et al (2017) In-datacenter performance analysis of a tensor processing unit. In: Proceedings of the 44th annual international symposium on computer architecture. ACM, New York, pp. 1–12Google Scholar
  19. 19.
    Kahng AB, Kang S (2012) Accuracy-configurable adder for approximate arithmetic designs. In: Proceedings of the 49th annual design automation conference, pp 820–825Google Scholar
  20. 20.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp 1097–1105Google Scholar
  21. 21.
    Leon-Garcia A (2007) Probability and random processes for EE’s, 3rd edn. Prentice-Hall, Upper Saddle RiverGoogle Scholar
  22. 22.
    Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2016) Pruning filters for efficient convnets. arXiv:1608.08710Google Scholar
  23. 23.
    Liu C, Han J, Lombardi F (2014) A low-power, high-performance approximate multiplier with configurable partial error recovery. In: Proceedings of the conference on design, automation & test in Europe , 95 ppGoogle Scholar
  24. 24.
    Mahdiani HR, Ahmadi A, Fakhraie SM, Lucas C (2010) Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications. IEEE Trans Circuits Syst I: Regul Pap 57(4):850–862CrossRefMathSciNetGoogle Scholar
  25. 25.
    Maheshwari N, Yang Z, Han J, Lombardi F (2015) A design approach for compressor based approximate multipliers. In: Proceedings of 28th international conference on VLSI design, pp 209–214Google Scholar
  26. 26.
    Parashar A, Rhu M, Mukkara A, Puglielli A, Venkatesan R, Khailany B, Emer J, Keckler SW, Dally WJ (2017) SCNN: an accelerator for compressed-sparse convolutional neural networks. In: Proceedings of the 44th annual international symposium on computer architecture. ACM, New York, pp 27–40Google Scholar
  27. 27.
    Qiu J, Wang J, Yao S, Guo K, Li B, Zhou E, Yu J, Tang T, Xu N, Song S et al (2016) Going deeper with embedded FPGA platform for convolutional neural network. In: Proceedings of the 2016 ACM/SIGDA international symposium on field-programmable gate arrays. ACM, New York, pp 26–35CrossRefGoogle Scholar
  28. 28.
    Raha A, Jayakumar H, Sutar S, Vijay R (2015) Quality-aware data allocation in approximate dram. In: Proceedings of the 2015 international conference on compilers, architecture and synthesis for embedded systems. IEEE Press, New York, pp 89–98Google Scholar
  29. 29.
    Rehman S, El-Harouni W, Shafique M, Kumar A, Henkel J (2016) Architectural-space exploration of approximate multipliers. In: International conference on computer-aided design, pp 1–6Google Scholar
  30. 30.
    Sampaio F, Shafique M, Zatt B, Bampi S, Henkel J (2015) Approximation-aware multi-level cells STT-RAM cache architecture. In: 2015 International conference on compilers, architecture and synthesis for embedded systems (CASES). IEEE, Piscataway, pp 79–88CrossRefGoogle Scholar
  31. 31.
    Sampson A, Dietl W, Fortuna E, Gnanapragasam D, Ceze L, Grossman D (2011) EnerJ: approximate data types for safe and general low-power computation. In: ACM SIGPLAN notices, vol 46. ACM, New York, pp 164–174Google Scholar
  32. 32.
    Shafique M, Ahmad W, Hafiz R, Henkel J (2015) A low latency generic accuracy configurable adder. In: 2015 52nd ACM/EDAC/IEEE design automation conference (DAC). IEEE, Piscataway, pp 1–6Google Scholar
  33. 33.
    Shafique M, Sampaio F, Zatt B, Bampi S, Henkel J (2015) Resilience-driven STT-RAM cache architecture for approximate computing. In: Workshop on approximate computing (AC), PaderbornGoogle Scholar
  34. 34.
    Shim B, Shanbhag NR (2006) Energy-efficient soft error-tolerant digital signal processing. IEEE Trans Very Large Scale Integr Syst 14(4):336–348CrossRefGoogle Scholar
  35. 35.
    Shim B, Sridhara SR, Shanbhag NR (2004) Reliable low-power digital signal processing via reduced precision redundancy. IEEE Trans Very Large Scale Integr Syst 12(5):497–510CrossRefGoogle Scholar
  36. 36.
    Sze V, Chen YH, Yang TJ, Emer JS (2017) Efficient processing of deep neural networks: a tutorial and survey. Proc IEEE 105(12):2295–2329CrossRefGoogle Scholar
  37. 37.
    Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9Google Scholar
  38. 38.
    Teimoori MT, Hanif MA, Ejlali A, Shafique M (2018) AdAM: adaptive approximation management for the non-volatile memory hierarchies. In: Design, automation test in Europe conference exhibition (DATE), 2018Google Scholar
  39. 39.
    Vedaldi A, Lenc K (2015) Matconvnet – convolutional neural networks for matlab. In: Proceeding of the ACM International Conference on MultimediaGoogle Scholar
  40. 40.
    Yang Z, Jain A, Liang J, Han J, Lombardi F (2013) Approximate XOR/XNOR-based adders for inexact computing. In: 13th IEEE conference on nanotechnology, pp 690–693Google Scholar
  41. 41.
    Zhu N, Goh WL, Yeo KS (2009) An enhanced low-power high-speed adder for error-tolerant application. In Proceedings of 12th symposium on integrated circuits, pp 69–72Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Muhammad Abdullah Hanif
    • 1
    Email author
  • Muhammad Usama Javed
    • 2
  • Rehan Hafiz
    • 2
  • Semeen Rehman
    • 1
  • Muhammad Shafique
    • 1
  1. 1.Vienna University of Technology (TU Wien)ViennaAustria
  2. 2.Information Technology University (ITU)LahorePakistan

Personalised recommendations