Active Accelerated Self-healing as a Key Design Knob for Cross-Layer Resilience

  • Xinfei Guo
  • Mircea R. Stan


Cross-layer resiliency is a closer to optimal way of maximizing reliability by breaking the abstraction layers boundaries across the system stack. In this chapter, we discuss how accelerated and active self-healing methods can be effectively applied at different levels in the system hierarchy. Circuit blocks that were presented in the previous chapter serve as the underlying infrastructure for recovery; at the architecture level, unit-level self-healing and intrinsic heat reduce the hardware costs for recovery through architectural opportunities; at the system level, scheduling that follows certain circadian rhythm can be implemented to deeply heal the circuit. Overall, these techniques can work together and compensate the trade-offs necessary for recovery.


Accelerated self-healing Cross-layer resilience Unit-level healing Dark silicon Scheduling Multicore 


  1. 1.
    Computing Community Consortium (CCC) Visioning Study on Cross-Layer Reliability.
  2. 2.
    Nicholas P Carter, Helia Naeimi, and Donald S Gardner. Design techniques for cross-layer resilience. In Proceedings of the Conference on Design, Automation and Test in Europe, pages 1023–1028. European Design and Automation Association, 2010.Google Scholar
  3. 3.
    Subhasish Mitra, Kevin Brelsford, and Pia N Sanda. Cross-layer resilience challenges: Metrics and optimization. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2010, pages 1029–1034. IEEE, 2010.Google Scholar
  4. 4.
    E. Cheng, J. Abraham, P. Bose, A. Buyuktosunoglu, K. Campbell, D. Chen, C. Y. Cher, H. Cho, B. Le, K. Lilja, S. Mirkhani, K. Skadron, M. Stan, L. Szafaryn, C. Vezyrtzis, and S. Mitra. Cross-layer resilience in low-voltage digital systems: Key insights. In 2017 IEEE International Conference on Computer Design (ICCD), pages 593–596, Nov 2017.Google Scholar
  5. 5.
    S Sarma, N Dutt, N Venkatasubramanian, A Nicolau, and P Gupta. Cyberphysical system-on-chip (cpsoc): Sensor actuator rich self-aware computational platform. University of California Irvine, Tech. Rep. CECS TR-13-06, 2013.Google Scholar
  6. 6.
    Alec Roelke, Xinfei Guo, and Mircea R Stan. OldSpot: A Pre-RTL Model for Fine-grained Aging and Lifetime Optimization. In Computer Design (ICCD), 2018 IEEE International Conference on. IEEE, 2018.Google Scholar
  7. 7.
    Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R Hower, Tushar Krishna, Somayeh Sardashti, et al. The gem5 simulator. ACM SIGARCH Computer Architecture News, 39(2):1–7, 2011.CrossRefGoogle Scholar
  8. 8.
    Sheng Li, Jung Ho Ahn, Richard D Strong, Jay B Brockman, Dean M Tullsen, and Norman P Jouppi. McPAT: an integrated power, area, and timing modeling framework for multicore and many core architectures. In Microarchitecture, 2009. MICRO-42. 42nd Annual IEEE/ACM International Symposium on, pages 469–480. IEEE, 2009.Google Scholar
  9. 9.
    Wei Huang, Shougata Ghosh, Sivakumar Velusamy, Karthik Sankaranarayanan, Kevin Skadron, and Mircea R Stan. HotSpot: A compact thermal modeling methodology for early-stage VLSI design. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14(5):501–513, 2006.Google Scholar
  10. 10.
    Gregory G Faust, Runjie Zhang, Kevin Skadron, Mircea R Stan, and Brett H Meyer. ArchFP: Rapid prototyping of pre-RTL floorplans. In VLSI and System-on-Chip (VLSI-SoC), 2012 IEEE/IFIP 20th International Conference on, pages 183–188. IEEE, 2012.Google Scholar
  11. 11.
    Christian Bienia. Benchmarking modern multiprocessors. Princeton University, 2011.Google Scholar
  12. 12.
    Hadi Esmaeilzadeh, Emily Blem, Renee St Amant, Karthikeyan Sankaralingam, and Doug Burger. Dark silicon and the end of multicore scaling. In Computer Architecture (ISCA), 2011 38th Annual International Symposium on, pages 365–376. IEEE, 2011.Google Scholar
  13. 13.
    Jorg Henkel, Heba Khdr, Santiago Pagani, and Muhammad Shafique. New trends in dark silicon. In Design Automation Conference (DAC), 2015 52nd ACM/EDAC/IEEE, pages 1–6. IEEE, 2015.Google Scholar
  14. 14.
    Lin Huang and Qiang Xu. Characterizing the lifetime reliability of many core processors with core-level redundancy. In Computer-Aided Design (ICCAD), 2010 IEEE/ACM International Conference on, pages 680–685. IEEE, 2010.Google Scholar
  15. 15.
    Cheng Zhuo, Kaviraj Chopra, Dennis Sylvester, and David Blaauw. Process variation and temperature-aware full chip oxide breakdown reliability analysis. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, 30(9):1321–1334, 2011.CrossRefGoogle Scholar
  16. 16.
    Paul Bogdan, Siddharth Garg, and Umit Y Ogras. Energy-efficient computing from systems-on-chip to micro-server and data centers. In Green Computing Conference and Sustainable Computing Conference (IGSC), 2015 Sixth International, pages 1–6. IEEE, 2015.Google Scholar
  17. 17.
    Anshul Gandhi, Mor Harchol-Balter, and Michael A Kozuch. Are sleep states effective in data centers? In Green Computing Conference (IGCC), 2012 International, pages 1–10. IEEE, 2012.Google Scholar
  18. 18.
    A. Paya and D. Marinescu. Energy-aware load balancing and application scaling for the cloud ecosystem. IEEE Transactions on Cloud Computing, PP(99):1–1, 2015.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Xinfei Guo
    • 1
  • Mircea R. Stan
    • 1
  1. 1.University of VirginiaCharlottesvilleUSA

Personalised recommendations