• Hantao HuangEmail author
  • Hao Yu
Part of the Computer Architecture and Design Methodologies book series (CADM)


In this chapter, we introduce the background of Internet-of-Things (IoT) system and discuss the three major technology layers in IoT. Furthermore, we discuss the machine learning based data analytics techniques from both the algorithm perspective and computation perspective. As the increasing complexity of machine learning algorithms, there is an emerging need to re-examine the current computation platform. A dedicated hardware computation platform becomes a solution of IoT systems. We further discuss the hardware computation platform on both CMOS and RRAM technology.


IoT Machine learning Energy-efficient computation Neural network 


  1. 1.
    Chen PY, Kadetotad D, Xu Z, Mohanty A, Lin B, Ye J, Vrudhula S, Js Seo, Cao Y, Yu S (2015) Technology-design co-optimization of resistive cross-point array for accelerating learning algorithms on chip. IEEE Proc 2015 Des Autom Test Eur Conf Exhib, EDA Consortium, 854–859Google Scholar
  2. 2.
    Chen YH, Emer J, Sze V (2016) Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. ACM SIGARCH Comput Arch News 44:367–379 IEEE PressCrossRefGoogle Scholar
  3. 3.
    Chen YH, Krishna T, Emer JS, Sze V (2017) Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 52(1):127–138CrossRefGoogle Scholar
  4. 4.
    Davis A, Arel I (2013) Low-rank approximations for conditional feedforward computation in deep neural networks. arXiv:13124461
  5. 5.
    Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: International conference on artificial intelligence and statistics. Sardinia, Italy, pp 249–256Google Scholar
  6. 6.
    Han S, Mao H, Dally WJ (2015) Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv:151000149
  7. 7.
    Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput. 18(7):1527–1554MathSciNetCrossRefGoogle Scholar
  8. 8.
    Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:170404861
  9. 9.
    Huang H, Yu H (2018) LTNN: A layer-wise tensorized compression of multilayer neural network. IEEE Trans Neural Netw Learn Syst.
  10. 10.
    Huang H, Cai Y, Yu H (2016) Distributed-neuron-network based machine learning on smart-gateway network towards real-time indoor data analytics. In: Conference on design automation and test in Europe. Dresden, Germany, pp 720–725Google Scholar
  11. 11.
    Huang H, Cai Y, Xu H, Yu H (2017a) A multi-agent minority-game based demand-response management of smart buildings towards peak load reduction. IEEE Trans Comput-Aided Des Integr Circuits Syst 36(4):573–585. Scholar
  12. 12.
    Huang H, Khalid RS, Yu H (2017b) Distributed machine learning on smart-gateway network towards real-time indoor data analytics. In: Data science and big data: an environment of computational intelligence. Springer, Berlin, pp 231–263.
  13. 13.
    Huang H, Ni L, Wang K, Wang Y, Yu H (2018a) A highly parallel and energy efficient three-dimensional multilayer cmos-rram accelerator for tensorized neural network. IEEE Trans Nanotechnol 17(4):645–656. Scholar
  14. 14.
    Huang H, Xu H, Cai Y, Khalid RS, Yu H (2018b) Distributed machine learning on smart-gateway network toward real-time smart-grid energy management with behavior cognition. ACM Trans Des Autom Electron Syst (TODAES) 23(5):56. Scholar
  15. 15.
    Hubara I, Soudry D, Yaniv RE (2016) Binarized neural networks. arXiv:160202505
  16. 16.
    Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K (2016) Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 0.5 mb model size. arXiv:160207360
  17. 17.
    Jeon D, Dong Q, Kim Y, Wang X, Chen S, Yu H, Blaauw D, Sylvester D (2015) A 23mw face recognition accelerator in 40nm cmos with mostly-read 5t memory. In: 2015 Symposium on VLSI Circuits (VLSI Circuits). IEEE, pp C48–C49Google Scholar
  18. 18.
    Jeon D, Dong Q, Kim Y, Wang X, Chen S, Yu H, Blaauw D, Sylvester D (2017) A 23-mw face recognition processor with mostly-read 5t memory in 40-nm cmos. IEEE J Solid-State Circuits 52(6):1628–1642CrossRefGoogle Scholar
  19. 19.
    Jouppi NP, Young C, Patil N, Patterson D, Agrawal G, Bajwa R, Bates S, Bhatia S, Boden N, Borchers A, et al (2017) In-datacenter performance analysis of a tensor processing unit. In: 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). IEEE, pp 1–12Google Scholar
  20. 20.
    Li Y, Liu Z, Xu K, Yu H, Ren F (2018) A gpu-outperforming fpga accelerator architecture for binary convolutional neural networks. ACM J Emerg Technol Comput Systs (JETC) 14(2):18Google Scholar
  21. 21.
    Liu Z, Hu Y, Xu H, Nasser L, Coquet P, Boudier T, Yu H (2017) Nucleinet: a convolutional encoder-decoder network for bio-image denoising. In: Engineering in medicine and biology society (EMBC) 2017 39th annual international conference of the IEEE. IEEE, pp 1986–1989Google Scholar
  22. 22.
    Liu Z, Li Y, Ren F, Goh WL, Yu H (2018) Squeezedtext: A real-time scene text recognition by binary convolutional encoder-decoder network. In: AAAIGoogle Scholar
  23. 23.
    Lueth KL (2015) Iot basics: Getting started with the internet of things. White paperGoogle Scholar
  24. 24.
    Micron Technology I (2017) Breakthrough nonvolatile memory technology. Accessed 04 Jan 2018
  25. 25.
    Mitchell TM (1997) Machine learning (mcgraw-hill international editions computer science series). McGraw-HillGoogle Scholar
  26. 26.
    Ni L, Wang Y, Yu H, Yang W, Weng C, Zhao J (2016) An energy-efficient matrix multiplication accelerator by distributed in-memory computing on binary RRAM crossbar. In: Design Automation Conference (ASP-DAC). IEEE, Macao, China, pp 280–285Google Scholar
  27. 27.
    Ni L, Huang H, Liu Z, Joshi RV, Yu H (2017a) Distributed in-memory computing on binary rram crossbar. ACM J Emerg Technol Comput Syst (JETC) 13(3):36. Scholar
  28. 28.
    Ni L, Liu Z, Song W, Yang JJ, Yu H, Wang K, Wang Y (2017b) An energy-efficient and high-throughput bitwise cnn on sneak-path-free digital reram crossbar. In: 2017 IEEE/ACM international symposium on low power electronics and design (ISLPED). IEEE, pp 1–6Google Scholar
  29. 29.
    Ni L, Liu Z, Yu H, Joshi RV (2017c) An energy-efficient digital reram-crossbar-based cnn with bitwise parallelism. IEEE J Explor Solid-State Comput Devices Circuits 3:37–46CrossRefGoogle Scholar
  30. 30.
    Ovtcharov K, Ruwase O, Kim JY, Fowers J, Strauss K, Chung ES (2015) Accelerating deep convolutional neural networks using specialized hardware. Microsoft Res Whitepaper 2(11)Google Scholar
  31. 31.
    Pd SM, Lin J, Zhu S, Yin Y, Liu X, Huang X, Song C, Zhang W, Yan M, Yu Z et al (2017) A scalable network-on-chip microprocessor with 2.5 d integrated memory and accelerator. IEEE Trans Circuits Syst I: Regul Pap 64(6):1432–1443CrossRefGoogle Scholar
  32. 32.
    Shi W, Cao J, Zhang Q, Li Y, Xu L (2016) Edge computing: vision and challenges. IEEE Internet Things J 3(5):637–646CrossRefGoogle Scholar
  33. 33.
    Wang Y, Huang H, Ni L, Yu H, Yan M, Weng C, Yang W, Zhao J (2015a) An energy-efficient non-volatile in-memory accelerator for sparse-representation based face recognition. In: Design, automation and test in Europe conference and exhibition (DATE). IEEE, pp 932–935Google Scholar
  34. 34.
    Wang Y, Yu H, Ni L, Huang GB, Yan M, Weng C, Yang W, Zhao J (2015b) An energy-efficient nonvolatile in-memory computing architecture for extreme learning machine by domain-wall nanowire devices. IEEE Trans Nanotechnol 14(6):998–1012CrossRefGoogle Scholar
  35. 35.
    Wang Y, Ni L, Chang CH, Yu H (2016) Dw-aes: A domain-wall nanowire-based aes for high throughput and energy-efficient data encryption in non-volatile memory. IEEE Trans Inf Forensics Secur 11(11):2426–2440CrossRefGoogle Scholar
  36. 36.
    Wang Y, Li X, Xu K, Ren F, Yu H (2017) Data-driven sampling matrix boolean optimization for energy-efficient biomedical signal acquisition by compressive sensing. IEEE Trans Biomed Circuits Syst 11(2):255–266CrossRefGoogle Scholar
  37. 37.
    Xu H, Huang H, Khalid RS, Yu H (2016) Distributed machine learning based smart-grid energy management with occupant cognition. In: 2016 IEEE international conference on smart grid communications (SmartGridComm). IEEE, pp 491–496.
  38. 38.
  39. 39.
    Yu S et al (2013) 3D vertical RRAM-scaling limit analysis and demonstration of 3D array operation. In: Symposium on VLSI technology and circuits. Kyoto, Japan, pp 158–159Google Scholar
  40. 40.
    Zhang C, Wu W, Huang H, Yu H (2012) Fair energy resource allocation by minority game algorithm for smart buildings. In: Design automation conference in Europe. Dresden, GermanyGoogle Scholar
  41. 41.
    Zhang C, Li P, Sun G, Guan Y, Xiao B, Cong J (2015) Optimizing FPGA-based accelerator design for deep convolutional neural networks. In: International symposium on field-programmable gate arrays. Monterey, California, pp 161–170Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.School of Electrical and Electronic EngineeringNanyang Technological UniversitySingaporeSingapore
  2. 2.Department of Electrical and Electronic EngineeringSouthern University of Science and TechnologyShenzhenChina

Personalised recommendations