Data clustering for efficient approximate computing

  • Michael G. JordanEmail author
  • Marcelo Brandalero
  • Guilherme M. Malfatti
  • Geraldo F. Oliveira
  • Arthur F. Lorenzon
  • Bruno C. da Silva
  • Luigi Carro
  • Mateus B. Rutzig
  • Antonio Carlos S. Beck


Given the saturation of single-threaded performance improvements in General-Purpose Processor, novel architectural techniques are required to meet emerging demands. In this paper, we propose a generic acceleration framework for approximate algorithms that replaces function execution by table look-up accesses in dedicated memories. A strategy based on the K-Means Clustering algorithm is used to learn mappings from arbitrary function inputs to frequently occurring outputs at compile-time. At run-time, these learned values are fetched from dedicated look-up tables and the best result is selected using the Nearest-Centroid Classifier, which is implemented in hardware. The proposed approach improves over the state-of-the-art neural acceleration solution, with nearly 3X times better performance, \(18.72\%\) up to \(90.99\%\) energy reductions and \(17\%\) area savings under similar levels of quality, thus opening new opportunities for performance harvesting in approximate accelerators.


Approximate computing Approximate memoization Data clustering Reuse 



  1. 1.
    Beck ACS, Lisba CAL, Carro L (2012) Adaptable embedded systems. Springer Publishing Company, Incorporated, BerlinGoogle Scholar
  2. 2.
    Xu Q, Mytkowicz T, Kim NS (2016) Approximate computing: a survey. IEEE Des Test 33(1):8–22CrossRefGoogle Scholar
  3. 3.
    Mittal S (2016) A survey of techniques for approximate computing. ACM Comput Surv 48(4):1–33Google Scholar
  4. 4.
    Sidiroglou-Douskos S, Misailovic S, Hoffmann H, Rinard M (2011) Managing performance versus accuracy trade-offs with loop perforation. In: Proceedings of the ACM SIGSOFT symposium and European conference on foundations of software engineering (SIGSOFT/FSE)Google Scholar
  5. 5.
    Brandalero M, da Silveira LA, Souza JD, Beck ACS (2017) Accelerating error-tolerant applications with approximate function reuse. Sci Comput Progr 165:54–67CrossRefGoogle Scholar
  6. 6.
    Hegde R, Shanbhag NR (1999) Energy-efficient signal processing via algorithmic noise-tolerance. In: Proceedings of the international symposium on low power electronics and design (ISPLED)Google Scholar
  7. 7.
    Mohapatra D, Chippa VK, Raghunathan A, Roy K (2011) Design of voltage-scalable meta-functions for approximate computing. In: Proceedings of the design, automation & test in Europe (DATE), pp 1–6Google Scholar
  8. 8.
    Brandalero M, Beck ACS, Carro L, Shafique M (2018) Approximate on-the-fly coarse-grained reconfigurable acceleration for general-purpose applications. In: Design automation conference (DAC), pp 1–6Google Scholar
  9. 9.
    Esmaeilzadeh H, Sampson A, Ceze L, Burger D (2012) Neural acceleration for general-purpose approximate programs. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 449–460Google Scholar
  10. 10.
    Yazdanbakhsh A, Park J, Sharma, Lotfi-Kamran P, Esmaeilzadeh H (2015) Neural acceleration for GPU throughput processors. In: Proceedings of the international symposium on microarchitecture (MICRO), pp 482–493Google Scholar
  11. 11.
    Moreau T et al. (2015) SNNAP: approximate computing on programmable SoCs via neural acceleration. In: Proceedings of the international symposium on high performance computer architecture (HPCA), pp 603–614Google Scholar
  12. 12.
    St. Amant R et al (2014) General-purpose code acceleration with limited-precision analog computation. ACM SIGARCH Comput Arch News 42(3):505–516CrossRefGoogle Scholar
  13. 13.
    Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366 CrossRefGoogle Scholar
  14. 14.
    Chaudhuri S, Gulwani S, Lublinerman R, Navidpour S (2011) Proving programs robust. In: Proceedings of the ACM SIGSOFT symposium and european conference on foundations of software engineering (SIGSOFT/FSE), p 102Google Scholar
  15. 15.
    Yazdanbakhsh A, Mahajan D, Lotfi-Kamran P, Esmaeilzadeh H (2016) AxBench: a multiplatform benchmark suite for approximate computing. IEEE Des Test 34(2):60–68CrossRefGoogle Scholar
  16. 16.
    Muralimanohar N, Balasubramonian R, Jouppi NP (2009) CACTI 6.0: a tool to model large caches. Technical Report, HP LaboratoriesGoogle Scholar
  17. 17.
    Browne S, Dongarra J, Garner N, Ho G, Mucci P (2000) A portable programming interface for performance evaluation on modern processors. Int J High Perform Comput Appl 14(3):189–204CrossRefGoogle Scholar
  18. 18.
    Han J, Orshansky M (2013) Approximate computing: an emerging paradigm for energy-efficient design. In: Proceedings of the European test symposium (ETS), pp 1–6Google Scholar
  19. 19.
    Hoffmann H et al. (2011) Dynamic knobs for responsive power-aware computing. In: ACM SIGARCH computer architecture news, vol 39, no 1. ACM, pp 199–212Google Scholar
  20. 20.
    Misailovic S, Sidiroglou S, Hoffmann H, Rinard M (2010) Quality of service profiling. In: Proceedings of the international conference on software engineering (ICSE), p 25Google Scholar
  21. 21.
    Mengte J, Raghunathan A, Chakradhar S, Byna S (2010) Exploiting the forgiving nature of applications for scalable parallel execution. In: IEEE international symposium on parallel and distributed processing (IPDPS). IEEE, pp 1–12Google Scholar
  22. 22.
    Misailovic S, Sidiroglou S, Rinard MC (2012) Dancing with uncertainty. In: Proceedings of the 2012 ACM workshop on relaxing synchronization for multicore and manycore scalability. ACM, pp 51–60Google Scholar
  23. 23.
    Recht B, Re C, Wright S, Niu F (2011) Hogwild: a lock-free approach to parallelizing stochastic gradient descent. Adv Neural Inf Process Syst 693–701 Google Scholar
  24. 24.
    Renganarayana L, Srinivasan V, Nair R, Prener D (2012) Programming with relaxed synchronization. In: Proceedings of the 2012 ACM workshop on relaxing synchronization for multicore and manycore scalability. ACM, pp 41–50Google Scholar
  25. 25.
    Grigorian B, Farahpour N, Reinman G (2015) BRAINIAC: bringing reliable accuracy into neurally-implemented approximate computing. In: International symposium on high performance computer architecture (HPCA), pp 615–626Google Scholar
  26. 26.
    Chen T et al. (2012) BenchNN: on the broad potential application scope of hardware neural network accelerators. In: Proceedings of the international symposium on workload characterization (IISWC), pp 36–45Google Scholar
  27. 27.
    Ionica MH, Gregg D (2015) The movidius myriad architecture’s potential for scientific computing. IEEE Micro 35(1):6–14CrossRefGoogle Scholar
  28. 28.
    Chen Y-H, Krishna T, Emer JS, Sze V (2016) Eyeriss: an energy-efficient reconfigurable accelerator for deep convolutional neural networks. IEEE J Solid-State Circuits 52(1):127–138CrossRefGoogle Scholar
  29. 29.
    Yoffie DB (2014) Mobileye: the future of driverless cars. Harvard Business School Case, Boston, pp 421–715Google Scholar
  30. 30.
    Pham P-H et al (2012) Neuflow: dataflow vision processing system-on-a-chip. In: IEEE 55th international midwest symposium on circuits and systems (MWSCAS). IEEE, pp 1044–1047Google Scholar
  31. 31.
    Shoushtari M, BanaiyanMofrad A, Dutt N (2015) Exploiting partially-forgetful memories for approximate computing. IEEE Embed Syst Lett 7(1):19–22CrossRefGoogle Scholar
  32. 32.
    Shafique M, Hafiz R, Rehman S, El-Harouni W, Henkel J (2016) Cross-layer approximate computing: from logic to architectures. In: Design automation conference (DAC), pp 1–6Google Scholar
  33. 33.
    Alvarez C, Corbal J, Valero M (2005) Fuzzy memoization for floating-point multimedia applications. IEEE Trans Comput 54(7):922–927CrossRefGoogle Scholar
  34. 34.
    Liu S, Pattabiraman K, Moscibroda T, Zorn BG (2009) Flicker: saving refresh-power in mobile devices through critical data partitioning. In: Proceedings of the international conference on architectural support for programming languages and operating systems (ASPLOS’09). CiteseerGoogle Scholar
  35. 35.
    Lucas J, Alvarez-Mesa M, Andersch M, Juurlink B (2014) Sparkk: quality-scalable approximate storage in dram. In: Memory Forum 1–9Google Scholar
  36. 36.
    Chang IJ, Mohapatra D, Roy K (2011) A priority-based 6t/8t hybrid sram architecture for aggressive voltage scaling in video applications. IEEE Trans Circuits Syst Video Technol 21(2):101–112CrossRefGoogle Scholar
  37. 37.
    Werbos PJ (1990) Backpropagation through time: what it does and how to do it. Proc IEEE 78(10):1550–1560CrossRefGoogle Scholar
  38. 38.
    Jain AK (2010) Data clustering: 50 years beyond K-means. Pattern Recognit Lett 31(8):651–666CrossRefGoogle Scholar
  39. 39.
    Suresh A, Swamy BN, Rohou E, Seznec A (2015) Intercepting functions for memoization: a case study using transcendental functions. ACM Trans Archit Code Optim (TACO) 12(2):18Google Scholar
  40. 40.
    Sampson A et al (2011) EnerJ: approximate data types for safe and general low-power computation. In: Proceedings of the conference on programming language design and implementation (PLDI), vol 46, no 6, p 164Google Scholar
  41. 41.
    Baek W, Chilimbi TM (2010) Green: a framework for supporting energy-conscious programming using controlled approximation. In: ACM sigplan notices, vol 45, no 6. ACM, pp 198–209Google Scholar
  42. 42.
    Esmaeilzadeh H, Sampson A, Ceze L, Burger D (2012) Architecture support for disciplined approximate programming. In: ACM SIGPLAN notices, vol 47, no 4. ACM, pp 301–312Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Michael G. Jordan
    • 3
    Email author
  • Marcelo Brandalero
    • 3
  • Guilherme M. Malfatti
    • 3
  • Geraldo F. Oliveira
    • 3
  • Arthur F. Lorenzon
    • 1
  • Bruno C. da Silva
    • 3
  • Luigi Carro
    • 3
  • Mateus B. Rutzig
    • 2
  • Antonio Carlos S. Beck
    • 3
  1. 1.Campus AlegreteUniversidade Federal do Pampa (UNIPAMPA)BagéBrazil
  2. 2.Universidade Federal de Santa Maria (UFSM)Santa MariaBrazil
  3. 3.Institute of InformaticsUniversidade Federal do Rio Grande do Sul (UFRGS)Porto AlegreBrazil

Personalised recommendations