Advertisement

Energy-Efficient Design of Advanced Machine Learning Hardware

  • Muhammad Abdullah HanifEmail author
  • Rehan Hafiz
  • Muhammad Usama Javed
  • Semeen Rehman
  • Muhammad Shafique
Chapter

Abstract

The exponentially growing rates of data production in the current era of internet of things (IoT), cyber-physical systems (CPS), and big data pose ever-increasing demands for massive data processing, storage, and transmission. Such systems are required to be robust, intelligent, and self-learning while possessing the capabilities of high-performance and power-/energy-efficient systems. As a result, a hype in the artificial intelligence and machine learning research has surfaced in numerous communities (e.g., deep learning and hardware architecture).

This chapter first provides a brief overview of machine learning and neural networks followed by few of the most prominent techniques that have been used so far for designing energy-efficient accelerators for machine learning algorithms, particularly related to deep neural networks. Inspired by the scalable effort principles of human brains (i.e., scaling computing effort for required precision of the task, or for the recurrent execution of same/similar tasks), we focus on the (re-)emerging area of approximate computing (aka InExact Computing) which aims at relaxing the bounds of precise/exact computing to provide new opportunities for improving the area, power/energy, and performance efficiency of systems by orders of magnitude at the cost of reduced output quality. We also guide through a holistic methodology that encompasses the complete design phase, i.e., from algorithm to architectures. At the end, we summarize the challenges and the associated research roadmap that can aid in developing energy-efficient and adaptable hardware accelerators for machine learning.

References

  1. 1.
  2. 2.
    F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla, N. Imam, Y. Nakamura, P. Datta, G.J. Nam, B. Taba, Truenorth: design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 34(10), 1537–1557 (2015)CrossRefGoogle Scholar
  3. 3.
    J. Albericio, P. Judd, A. Delmás, S. Sharify, A. Moshovos, Bit-pragmatic deep neural network computing (2016). Preprint. arXiv:1610.06920Google Scholar
  4. 4.
    J. Albericio, P. Judd, T. Hetherington, T. Aamodt, N.E. Jerger, A. Moshovos, Cnvlutin: ineffectual-neuron-free deep neural network computing, in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (IEEE, Piscataway, 2016), pp. 1–13Google Scholar
  5. 5.
    M. Alwani, H. Chen, M. Ferdman, P. Milder, Fused-layer CNN accelerators, in 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (IEEE, Piscataway, 2016), pp. 1–12CrossRefGoogle Scholar
  6. 6.
    S. Anwar, K. Hwang, W. Sung, Structured pruning of deep convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. 13(3), 32 (2017)Google Scholar
  7. 7.
    B.V. Benjamin, P. Gao, E. McQuinn, S. Choudhary, A.R. Chandrasekaran, J.M. Bussat, R. Alvarez-Icaza, J.V. Arthur, P.A., Merolla, K. Boahen, Neurogrid: a mixed-analog-digital multichip system for large-scale neural simulations. Proc. IEEE 102(5), 699–716 (2014)Google Scholar
  8. 8.
    S. Borowiec, T. Lien, Alphago beats human go champ in milestone for artificial intelligence. Los Angeles Times 12 (2016)Google Scholar
  9. 9.
  10. 10.
    Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun et al., DaDianNao: a machine-learning supercomputer, in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture (IEEE Computer Society, Washington, 2014), pp. 609–622Google Scholar
  11. 11.
    Y.H. Chen, J. Emer, V. Sze, Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks, in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) (IEEE, Piscataway, 2016), pp. 367–379Google Scholar
  12. 12.
    Y.-H. Chen, J. Emer, V. Sze, Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks, in ACM SIGARCH Computer Architecture News, vol. 44, no. 3 (IEEE Press, Piscataway, 2016)CrossRefGoogle Scholar
  13. 13.
    P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, Y. Xie, PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory. SIGARCH Comput. Archit. News 44(3), 27–39 (2016)CrossRefGoogle Scholar
  14. 14.
    F. Chollet, Xception: deep learning with depthwise separable convolutions (2016). Preprint. arXiv:1610.02357Google Scholar
  15. 15.
    M. Courbariaux, Y. Bengio, J.P. David, Training deep neural networks with low precision multiplications (2014). Preprint. arXiv:1412.7024Google Scholar
  16. 16.
    M. Courbariaux, Y. Bengio, J.P. David, Binaryconnect: training deep neural networks with binary weights during propagations, in Advances in Neural Information Processing Systems (2015), pp. 3123–3131Google Scholar
  17. 17.
    M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural networks: training deep neural networks with weights and activations constrained to +1 or −1 (2016). Preprint. arXiv:1602.02830Google Scholar
  18. 18.
    Y.L. Cun, J.S. Denker, S.A. Solla, Optimal brain damage, in Advances in Neural Information Processing Systems 2 (Morgan Kaufmann Publishers, San Francisco, 1990), pp. 598–605. http://dl.acm.org/citation.cfm?id=109230.109298 Google Scholar
  19. 19.
    M. Davies, N. Srinivasa, T. Lin, G. Chinya, Y. Cao, S.H. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain et al., Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro 38(1), 82–99 (2018)CrossRefGoogle Scholar
  20. 20.
    L. Deng, The MNIST database of handwritten digit images for machine learning research [best of the web]. IEEE Signal Process. Mag. 29(6), 141–142 (2012)CrossRefGoogle Scholar
  21. 21.
    J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, L. Fei-Fei, Imagenet: a large-scale hierarchical image database, in IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009 (IEEE, Piscataway, 2009), pp. 248–255Google Scholar
  22. 22.
    Z. Du, A. Lingamneni, Y. Chen, K. Palem, O. Temam, C. Wu, Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators, in 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC) (IEEE, Piscataway, 2014), pp. 201–206Google Scholar
  23. 23.
    Z. Du, R. Fasthuber, T. Chen, P. Ienne, L. Li, T. Luo, X. Feng, Y. Chen, O. Temam, Shidiannao: shifting vision processing closer to the sensor, in ACM SIGARCH Computer Architecture News, vol. 43 (ACM, New York, 2015), pp. 92–104Google Scholar
  24. 24.
    S.K. Esser, R. Appuswamy, P. Merolla, J.V. Arthur, D.S. Modha, Backpropagation for energy-efficient neuromorphic computing, in Advances in Neural Information Processing Systems (2015), pp. 1117–1125Google Scholar
  25. 25.
    R. Fakoor, F. Ladhak, A. Nazi, M. Huber, Using deep learning to enhance cancer diagnosis and classification, in Proceedings of the International Conference on Machine Learning (2013)Google Scholar
  26. 26.
    K. Finley, Ai fighter pilot beats a human, but no need to panic (really) (2016). https://www.wired.com/2016/06/ai-fighter-pilot-beats-human-no-need-panic-really/
  27. 27.
  28. 28.
    V. Gupta, D. Mohapatra, S.P. Park, A. Raghunathan, K. Roy, Impact: imprecise adders for low-power approximate computing, in Proceedings of the 17th IEEE/ACM International Symposium on Low-Power Electronics and Design (IEEE Press, Piscataway, 2011), pp. 409–414CrossRefGoogle Scholar
  29. 29.
    P. Gysel, M. Motamedi, S. Ghiasi, Hardware-oriented approximation of convolutional neural networks (2016). Preprint. arXiv:1604.03168Google Scholar
  30. 30.
    S. Ha, J.M. Yun, S. Choi, Multi-modal convolutional neural networks for activity recognition, in 2015 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (IEEE, Piscataway, 2015), pp. 3017–3022Google Scholar
  31. 31.
    S. Han, H. Mao, W.J. Dally, Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding (2015). Preprint. arXiv:1510.00149Google Scholar
  32. 32.
    S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M.A. Horowitz, W.J. Dally, EIE: efficient inference engine on compressed deep neural network, in Proceedings of the 43rd International Symposium on Computer Architecture (IEEE Press, Piscataway, 2016), pp. 243–254Google Scholar
  33. 33.
    X. Han, D. Zhou, S. Wang, S. Kimura, CNN-MERP: an FPGA-based memory-efficient reconfigurable processor for forward and backward propagation of convolutional neural networks, in 2016 IEEE 34th International Conference on Computer Design (ICCD) (IEEE, Piscataway, 2016), pp. 320–327Google Scholar
  34. 34.
    K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778Google Scholar
  35. 35.
    T. Highlander, A. Rodriguez, Very efficient training of convolutional neural networks using fast Fourier transform and overlap-and-add (2016). Preprint. arXiv:1601.06815Google Scholar
  36. 36.
    A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam, Mobilenets: efficient convolutional neural networks for mobile vision applications (2017). https://arxiv.org/abs/1704.04861
  37. 37.
    F.N. Iandola, S. Han, M.W. Moskewicz, K. Ashraf, W.J. Dally, K. Keutzer, Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size (2016). Preprint. arXiv:1602.07360Google Scholar
  38. 38.
    H. Jiang, J. Han, F. Lombardi, A comparative review and evaluation of approximate adders, in Proceedings of the 25th edition on Great Lakes Symposium on VLSI (ACM, New York, 2015), pp. 343–348Google Scholar
  39. 39.
    H. Jiang, C. Liu, N. Maheshwari, F. Lombardi, J. Han, A comparative evaluation of approximate multipliers, in 2016 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH) (IEEE, Piscataway, 2016), pp. 191–196Google Scholar
  40. 40.
    J.H. Ko, B. Mudassar, T. Na, S. Mukhopadhyay, Design of an energy-efficient accelerator for training of convolutional neural networks using frequency-domain computation, in Proceedings of the 54th Annual Design Automation Conference 2017 (ACM, New York, 2017), p. 59Google Scholar
  41. 41.
    A. Lavin, S. Gray, Fast algorithms for convolutional neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2016), pp. 4013–4021Google Scholar
  42. 42.
    F. Li, B. Zhang, B. Liu, Ternary weight networks (2016). Preprint. arXiv:1605.04711Google Scholar
  43. 43.
  44. 44.
    W. Lu, G. Yan, J. Li, S. Gong, Y. Han, X. Li, Flexflow: a flexible dataflow accelerator architecture for convolutional neural networks, in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA) (IEEE, Piscataway, 2017), pp. 553–564Google Scholar
  45. 45.
    V. Mrazek, S.S. Sarwar, L. Sekanina, Z. Vasicek, K. Roy, Design of power-efficient approximate multipliers for approximate artificial neural networks, in 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (IEEE, Piscataway, 2016), pp. 1–7Google Scholar
  46. 46.
    V. Mrazek, R. Hrbacek, Z. Vasicek, L. Sekanina, Evoapproxsb: library of approximate adders and multipliers for circuit design and benchmarking of approximation methods, in 2017 Design, Automation & Test in Europe Conference & Exhibition (DATE) (IEEE, Piscataway, 2017), pp. 258–261Google Scholar
  47. 47.
    A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S.W. Keckler, W.J. Dally, SCNN: an accelerator for compressed-sparse convolutional neural networks, in Proceedings of the 44th Annual International Symposium on Computer Architecture (ACM, New York, 2017), pp. 27–40Google Scholar
  48. 48.
    M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, XNOR-Net: Imagenet classification using binary convolutional neural networks, in European Conference on Computer Vision (Springer, Berlin, 2016), pp. 525–542Google Scholar
  49. 49.
    S. Rehman, W. El-Harouni, M. Shafique, A. Kumar, J. Henkel, J. Henkel, Architectural-space exploration of approximate multipliers, in 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) (IEEE, Piscataway, 2016), pp. 1–8Google Scholar
  50. 50.
    M. Rhu, M. O’Connor, N. Chatterjee, J. Pool, S.W. Keckler, Compressing DMA engine: leveraging activation sparsity for training deep neural networks (2017). Preprint. arXiv:1705.01626Google Scholar
  51. 51.
    S.S. Sarwar, S. Venkataramani, A. Raghunathan, K. Roy, Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing, in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016 (IEEE, Piscataway, 2016), pp. 145–150Google Scholar
  52. 52.
    A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J.P. Strachan, M. Hu, R.S. Williams, V. Srikumar, ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars. SIGARCH Comput. Archit. News 44(3), 14–26 (2016)CrossRefGoogle Scholar
  53. 53.
    M. Shafique, W. Ahmad, R. Hafiz, J. Henkel, A low latency generic accuracy configurable adder, in 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC) (IEEE, Piscataway, 2015), pp. 1–6Google Scholar
  54. 54.
    M. Shafique, F. Sampaio, B. Zatt, S. Bampi, J. Henkel, Resilience-driven STT-RAM cache architecture for approximate computing, in Workshop on Approximate Computing (AC) (2015)Google Scholar
  55. 55.
    M. Shafique, R. Hafiz, S. Rehman, W. El-Harouni, J. Henkel, Cross-layer approximate computing: from logic to architectures, in 2016 53rd ACM/EDAC/IEEE Design Automation Conference (DAC) (IEEE, Piscataway, 2016), pp. 1–6Google Scholar
  56. 56.
    M. Shafique, R. Hafiz, M.U. Javed, S. Abbas, L. Sekanina, Z. Vasicek, V. Mrazek, Adaptive and energy-efficient architectures for machine learning: challenges, opportunities, and research roadmap, in 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) (IEEE, Piscataway, 2017), pp. 627–632Google Scholar
  57. 57.
    Y. Shen, M. Ferdman, P. Milder, Escher: a CNN accelerator with flexible buffering to minimize off-chip transfer, in 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) (IEEE, Piscataway, 2017)Google Scholar
  58. 58.
    K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014). Preprint. arXiv:1409.1556Google Scholar
  59. 59.
    L. Song, X. Qian, H. Li, Y. Chen, Pipelayer: a pipelined ReRAM-based accelerator for deep learning, in IEEE International Symposium on High Performance Computer Architecture (HPCA) (IEEE, Piscataway, 2017), pp. 541–552Google Scholar
  60. 60.
    C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), pp. 1–9Google Scholar
  61. 61.
  62. 62.
    A. Todri-Sanial, A. Magnani, M. De Magistris, A. Maffucci, Present and future prospects of carbon nanotube interconnects for energy efficient integrated circuits, in 2016 17th International Conference on Thermal, Mechanical and Multi-Physics Simulation and Experiments in Microelectronics and Microsystems (EuroSimE) (IEEE, Piscataway, 2016), pp. 1–5Google Scholar
  63. 63.
    C. Urmson, J. Anhalt, D. Bagnell, C. Baker, R. Bittner, M. Clark, J. Dolan, D. Duggins, T. Galatali, C. Geyer et al.: Autonomous driving in urban environments: boss and the urban challenge. J. Field Rob. 25(8), 425–466 (2008)CrossRefGoogle Scholar
  64. 64.
    J. Wang, J. Lin, Z. Wang, Efficient convolution architectures for convolutional neural network, in 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP) (IEEE, Piscataway, 2016) , pp. 1–5Google Scholar
  65. 65.
    X. Wei, C.H. Yu, P. Zhang, Y. Chen, Y. Wang, H. Hu, Y. Liang, J. Cong, Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs, in 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) (IEEE, Piscataway, 2017), pp. 1–6Google Scholar
  66. 66.
    Q. Xiao, Y. Liang, L. Lu, S. Yan, Y.W. Tai, Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs, in Proceedings of the 54th Annual Design Automation Conference 2017 (ACM, New York, 2017), p. 62Google Scholar
  67. 67.
    J. Yu, A. Lukefahr, D. Palframan, G. Dasika, R. Das, S. Mahlke, Scalpel: customizing DNN pruning to the underlying hardware parallelism, in Proceedings of the 44th Annual International Symposium on Computer Architecture (ACM, New York, 2017), pp. 548–560Google Scholar
  68. 68.
    C. Zhang, V.K. Prasanna, Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system, in FPGA (2017), pp. 35–44Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Muhammad Abdullah Hanif
    • 1
    Email author
  • Rehan Hafiz
    • 2
  • Muhammad Usama Javed
    • 2
  • Semeen Rehman
    • 1
  • Muhammad Shafique
    • 1
  1. 1.Vienna University of Technology (TU Wien)ViennaAustria
  2. 2.Information Technology University (ITU)LahorePakistan

Personalised recommendations