Automated Design Flows and Run-Time Optimization for Reconfigurable Microarchitecures

  • Saurabh Jain
  • Longyang Lin
  • Massimo Alioto


In this chapter, a systematic methodology is introduced to design reconfigurable microarchitectures through automated and architecture-agnostic design flows. The main goal is to enrich a baseline microarchitecture with additional registers for throughput enhancement and then make selected registers bypassable to flexibly switch among different microarchitectures. Similarly, design methodologies for reconfigurable SRAM memories are described. As common thread, drop-in solutions for existing architectures allowing the above capability at very low design effort are discussed.


Design methodology Digital design Design flow Gate-level netlist manipulation Automated synthesis and place&route CAD algorithm Reconfigurable microarchitecture Reconfigurable SRAM Pipestage-level reconfiguration Thread-level reconfiguration Bank-level reconfiguration Pipestage Pipeline stage Fan-out-of-4 delay Re-pipelining Register level Linear pipeline Feedforward pipeline Feedback pipeline Loop Register branch Dynamic energy Leakage energy Above-threshold region Near-threshold region Sub-threshold region Leakage-dynamic energy ratio Fixed microarchitectures Minimum energy point (MEP) Dynamically adaptable pipelines Dynamic voltage frequency scaling Power mode Bypassable register Bypassable flip-flop Non-bypassable register Non-bypassable flip-flop EDA tool Retiming Delay overhead Register bypassing Flip-flop bypassing Throughput enhancement Control flow Pipeline bubble Time-interleaved microarchitecture Time interleaving Input stream Instruction stream Channel Gate-level netlist SRAM Instruction memory Data memory Column multiplexing Column multiplexer Bitline Wordline Sense amplifier Precharge driver Write driver Bitcell Memory bank Memory sub-bank Reconfigurable array organization Access time Row aggregation Drop-in microarchitecture reconfiguration Electronic design automation (EDA) Pipeline stage unification AES Transparent register Static microarchitecture Dynamic microarchitecture Cycle-level timing Netlist Skeleton graph Register identification Bypassable register replacement Netlist-to-skeleton graph Graph weighting Level identification Cutset Feedforward cutset Cutset identification Cutset-to-pipeline mapping Even-numbered register identification Script Place&route (PNR) Behavioral RTL Register transfer level (RTL) Weighted skeleton graph Tcl script Cutset-based identification Non-linear pipelines Linear pipelines Register insertion Register merging Reconvergent path Branching path Graph Netlist graph Hash table Flip-flop reset Graph edge Graph node Dummy node Dummy edge Static timing analysis (STA) Weight Graph traversal Depth-first traversal Row decoder Reconfigurable decoder 


  1. 1.
    International Technology Roadmap for Semiconductors: 2015 edition., (2013)
  2. 2.
    B. Nikolić, Power-limited design, in Proceedings of ICECS 2007, (2007), pp. 927–930Google Scholar
  3. 3.
    T. Burd, T. Pering, A. Stratakos, R. Brodersen, A dynamic voltage scaled microprocessor system, in IEEE ISSCC Digest of Technical Papers, (2015), pp. 294–295Google Scholar
  4. 4.
    S. Jain et al., A 280mV-to-1.2V wide-operating-range IA-32 processor in 32 nm CMOS, in IEEE ISSCC Digest of Technical Papers, (2012), pp. 66–67Google Scholar
  5. 5.
    W. Wang, P. Mishra, System-wide leakage-aware energy minimization using dynamic voltage scaling and cache reconfiguration in multitasking systems. IEEE Trans. VLSI Syst. 20(5), 902–910 (2012)CrossRefGoogle Scholar
  6. 6.
    A. Chandrakasan, D. Daly, D. Finchelstein, J. Kwong, Y. Ramadass, M. Sinangil, V. Sze, N. Verma, Technologies for ultradynamic voltage scaling. Proc. IEEE 98(2), 191–214 (2010)CrossRefGoogle Scholar
  7. 7.
    M. Seok, D. Jeon, C. Chakrabarti, D. Blaauw, D. Sylvester, Extending energy-saving voltage scaling in ultra low voltage integrated circuit designs, in Proc. of ICICDT 2012—IEEE International Conference on Integrated Circuit Design and Technology, (2012), pp. 2–5Google Scholar
  8. 8.
    D. Jacquet, F. Hasbani, P. Flatresse, R. Wilson, F. Arnaud, G. Cesana, P. Magarshack, A 3 GHz dual core processor ARM cortex TM -A9 in 28 nm UTBB FD-SOI CMOS with ultra-wide voltage range and energy efficiency optimization. IEEE J. Solid State Circuits 49(4), 812–826 (2014)CrossRefGoogle Scholar
  9. 9.
    F. Abouzeid, S. Clerc, B. Pelloux-Prayer, F. Argoud, P. Roche, 28nm CMOS, energy efficient and variability tolerant, 350 mV-to-1.0 V, 10 MHz/700 MHz, 252 bits frame error-decoder, in Proceedings of ESSCIRC 2012, Bordeaux, France, (2012), pp. 153–156CrossRefGoogle Scholar
  10. 10.
    S. Hsu, A. Agarwal, M. Anders, S. Mathew, H. Kaul, F. Sheikh, R. Krishnamurthy, A 280 mV-to-1.1 V 256b reconfigurable SIMD vector permutation engine with 2-dimensional shuffle in 22 nm CMOS, in ISSCC Digest of Technical Papers, San Francisco (CA), (2012)Google Scholar
  11. 11.
    S. Hanson, B. Zhai, K. Bernstein, D. Blaauw, A. Bryant, L. Chang, K.K. Das, W. Haensch, E.J. Nowak, D.M. Sylvester, Ultralow-voltage, minimum-energy CMOS. IBM J. Res. Dev. 50(4/5) (2006)CrossRefGoogle Scholar
  12. 12.
    S. Hanson, B. Zhai, D. Blaauw, D. Sylvester, A. Bryant, X. Wang, Energy optimality and variability in subthreshold design, in Proceedings of ISLPED 2006, pp. 363–365Google Scholar
  13. 13.
    W. Zhao, Y. Ha, M. Alioto, Novel self-body-biasing and statistical design for near-threshold circuits with ultra energy-efficient AES as case study. IEEE Trans. VLSI Syst. 23(8), 1390–1401 (2015)CrossRefGoogle Scholar
  14. 14.
    Y. Zhang, M. Khayatzadeh, K. Yang, M. Saligane, M. Alioto, D. Blaauw, D. Sylvester, iRazor: 3-transistor current-based error detection and correction in an ARM Cortex-R4 Processor, in IEEE ISSCC Digest of Technical Papers, (2016), pp. 160–161Google Scholar
  15. 15.
    K. Nose, T. Sakurai, Optimization of VDD and VTH for low-power and high-speed applications, in Proceedings of DAC, Yokohama (Japan), (2000)Google Scholar
  16. 16.
    B. Zhai, D. Blaauw, D. Sylvester, K. Flautner, Theoretical and practical limits of dynamic voltage scaling, in Proceedings of DAC, (2004)Google Scholar
  17. 17.
    S. Jain, L. Lin, M. Alioto, Design-oriented energy models for wide voltage scaling down to the minimum energy point. IEEE Trans. CAS Pt. I 64(12), 3115–3125 (2017)Google Scholar
  18. 18.
    A.P. Chandrakasan, S. Sheng, R.W. Brodersen, Low-power CMOS digital design. IEEE J. Solid State Circuits 27(4), 473–484 (1992)CrossRefGoogle Scholar
  19. 19.
    M. Alioto, Ultra-low power VLSI circuit design demystified and explained: a tutorial. IEEE Trans. Circuits Syst. I Regul. Pap. 59(1), 3–29 (2012)MathSciNetCrossRefGoogle Scholar
  20. 20.
    H. Shimada, H. Ando, T. Shimada, Pipeline stage unification: a low-energy consumption technique for future mobile processors. Proc. Int. Sympos. Low Power Electr. Design 2003, 326–329 (2003)Google Scholar
  21. 21.
    A. Efthymiou, J.D. Garside, Adaptive pipeline depth control for processor power-management, in Proceedings of IEEE International Conference on Computer Design: VLSI in Computers and Processors, (2002), pp. 454–457CrossRefGoogle Scholar
  22. 22.
    S. Vijayalakshmi, A. Anpalagan, I. Woungang, D.P. Kothari, Power management in multi-core processors using automatic dynamic pipeline stage unification, in 2013 International Symposium on Performance Evaluation of Computer and Telecommunication Systems (SPECTS), Toronto (Canada), (2013), pp. 120–127Google Scholar
  23. 23.
    S. Chellappa, C. Ramamurthy, V. Vashishtha, L.T. Clark, Advanced encryption system with dynamic pipeline reconfiguration for minimum energy operation, in Proceedings of 16th International Symposium on Quality Electronic Design (ISQED), Santa Clara (CA), (2015), pp. 201–206Google Scholar
  24. 24.
    H. Jacobson, Improved clock-gating through transparent pipelining, in Proceedings of the International Symposium on Low Power Electronics and Design 2004, Newport Beach (CA), (2004), pp. 26–31Google Scholar
  25. 25.
    S. Manne, A. Klauser, D. Grunwald, Pipeline gating: speculation control for energy reduction, in Proceedings of 25th Annual International Symposium on Computer Architecture, Barcelona (Spain), (1998), pp. 132–141Google Scholar
  26. 26.
    S. Jain, L. Lin, M. Alioto, Dynamically adaptable pipeline for energy-efficient microarchitectures under wide voltage scaling. IEEE Journal of Solid-State Circuits 53(2), 632–641 (2018)CrossRefGoogle Scholar
  27. 27.
    S. Jain, L. Lin, M. Alioto, Automated design of reconfigurable microarchitectures for accelerators under wide voltage scaling. In print on IEEE Transactions on Very Large Scale Integration SystemsGoogle Scholar
  28. 28.
    S. Jain, L. Lin, M. Alioto, Drop-in energy-performance range extension in microcontrollers beyond VDD scaling, in 2019 IEEE Asian Solid-State Circuits Conference, Macau, (2019), pp. 125–128Google Scholar
  29. 29.
    Synopsys, Design Compiler User Manual Version X-2005. Accessed 9 September 2005Google Scholar
  30. 30.
    Cadence, Encounter™ User Guide Product Version 4.1.5. Accessed May 2005Google Scholar
  31. 31.
    D. Markovic, R.W. Brodersen, DSP Architecture Design Essentials (Springer, Berlin, 2012)CrossRefGoogle Scholar
  32. 32.
    K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation (Wiley, New York, 1999)Google Scholar
  33. 33.
    S. Chatterjee, On algorithms for technology mapping, in Technical Report No. UCB/EECS-2007-100,, Accessed 16 August 2007
  34. 34.
    J.L. Hennessy, D.A. Patterson, Computer Architecture: A Quantitative Approach, 6th edn. (Morgan Kaufmann, San Francisco, CA, 2019)zbMATHGoogle Scholar
  35. 35.
    M. Gautschi et al., Near-threshold RISC-V core with DSP extensions for scalable IoT endpoint devices. IEEE Trans. Very Large Scale Integr. Syst. 25(10), 2700–2713 (2017)CrossRefGoogle Scholar
  36. 36.
    M. Alioto (ed.), Enabling the Internet of Things—From Integrated Circuits to Integrated Systems (Springer, Berlin, 2017)Google Scholar
  37. 37.
    M. Alioto, E. Consoli, G. Palumbo, Flip-Flop Design in Nanometer CMOS—From High Speed to Low Energy (Springer, Berlin, 2015)Google Scholar
  38. 38.
    D. Chinnery, K. Keutzer, Closing the Power Gap Between ASIC & Custom (Springer, Berlin, 2007)CrossRefGoogle Scholar
  39. 39.
    V. Srinivasan et al., Optimizing pipelines for power and performance, in Proceedings of International Symposium on Microarchitectures, (2002), pp. 333–344Google Scholar
  40. 40.
    V. Zyuban, D. Brooks, V. Srinivasan, M. Gschwind, P. Bose, P.N. Strenski, P.G. Emma, Integrated analysis of power and performance for pipelined microprocessors. IEEE Trans. Comput. 53(8), 1004–1016 (2004)CrossRefGoogle Scholar
  41. 41.
    N. Weste, D. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th edn. (Addison-Wesley, New York, 2011)Google Scholar
  42. 42.
    H. Shimada, H. Ando, T. Shimada, A hybrid power reduction scheme using pipeline stage unification and dynamic voltage scaling, in Proceedings of IEEE COOL Chips, (2006), pp. 201–214Google Scholar
  43. 43.
    J. Myers, A. Savanth, R. Gaddh, D. Howard, P. Prabhat, D. Flynn, A subthreshold ARM cortex-M0+ subsystem in 65 nm CMOS for WSN applications with 14 Power Domains, 10T SRAM, and integrated voltage regulator. IEEE J. Solid State Circuits 51(1), 31–44 (2016)CrossRefGoogle Scholar
  44. 44.
    Y. Zhang, L. Xu, Q. Dong, J. Wang, D. Blaauw, D. Sylvester, Recryptor: a reconfigurable cryptographic cortex-M0 processor with in-memory and near-memory computing for IoT security. IEEE J. Solid State Circuits 53(4), 995–1005 (2018)CrossRefGoogle Scholar
  45. 45.
    M.H. Abu-Rahma et al., Characterization of SRAM sense amplifier input offset for yield prediction in 28 nm CMOS, in Proceedings of the Custom Integrated Circuits Conference, (2011)Google Scholar
  46. 46.
    N. Verma, a.P. Chandrakasan, A 256 kb 65 nm 8T subthreshold SRAM employing sense-amplifier redundancy. IEEE J. Solid State Circuits 43(1), 141–149 (2008)CrossRefGoogle Scholar
  47. 47.
    M. Khayatzadeh, F. Frustaci, D. Blaauw, D. Sylvester, M. Alioto, A reconfigurable sense amplifier with 3X offset reduction in 28nm FDSOI CMOS, in IEEE Symposium on VLSI Circuits, Digest of Technical Papers, vol. 2015, (2015), pp. C270–C271Google Scholar
  48. 48.
    B. Giridhar, N. Pinckney, D. Sylvester, D. Blaauw, A reconfigurable sense amplifier with auto-zero calibration and pre-amplification in 28nm CMOS. IEEE Int. Solid State Circuits Conf. Dig. Tech. Pap. 57, 242–243 (2014)Google Scholar
  49. 49.
    M. Yoshimoto et al., A divided word-line structure in the static RAM and its application to a 64K full CMOS RAM. IEEE J. Solid State Circuits 18(5), 479–485 (1983)CrossRefGoogle Scholar
  50. 50.
    T.W. Oh, H. Jeong, J. Park, S.O. Jung, Pre-charged local bit-line sharing SRAM architecture for near-threshold operation. IEEE Trans. Circuits Syst. I Regul. Pap. 64(10), 2737–2747 (2017)CrossRefGoogle Scholar
  51. 51.
    F. Frustaci, M. Khayatzadeh, D. Blaauw, D. Sylvester, M. Alioto, SRAM for error-tolerant applications with dynamic energy-quality management in 28 nm CMOS. IEEE J. Solid State Circuits 50(5), 1310–1323 (2015)CrossRefGoogle Scholar
  52. 52.
    M. Alioto, V. De, A. Marongiu, Energy-quality scalable integrated circuits and systems: continuing energy scaling in the Twilight of Moore’s Law. IEEE J. Emerg. Select. Topics Circuits Syst. 8(4), 653–678 (2018)CrossRefGoogle Scholar
  53. 53.
    M. Alioto, S. Jain, RECMICRO: Design Framework and Scripts to Design Reconfigurable Microarchitectures [Online],

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Saurabh Jain
    • 1
  • Longyang Lin
    • 1
  • Massimo Alioto
    • 1
  1. 1.National University of SingaporeSingaporeSingapore

Personalised recommendations