Advertisement

A Fault-Tolerant, Dynamically Scheduled Pipeline Structure for Chip Multiprocessors

  • Hananeh Aliee
  • Hamid Reza Zarandi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6894)

Abstract

This paper presents a dynamically scheduled pipeline structure for chip multiprocessors (CMPs). This technique exploits existing Simultaneous Multithreading (SMT), superscalar chip multiprocessors’ redundancy to provide low-overhead, and broad coverage of faults at the cost of performance degradation for processors. This pipeline structure operates in two modes: 1) high-performance and 2) highly-reliable. In high-performance mode, each core works as a real SMT, superscalar processor. Whereas, the main contribution of the highly-reliable mode is: 1) To enhance the reliability of the system without adding extra redundancy strictly for fault tolerance, 2) To detect both transient and permanent faults, and 3) To recover existing faults. The experimental results show that the diagnosis mechanism quickly and accurately diagnoses faults. The fault detection latency for this technique is equal to the pipeline length of the processor, while it provides high fault detection coverage. Moreover, the reliable processor can function quite capably in the presence of both transient and permanent faults, despite of not using redundancy beyond which is already available in a modern microprocessor. Also, in the highly-reliable mode, the static and dynamic power consumption is declined by 25% and 36%, respectively.

Keywords

Reliability Transient fault Permanent fault Fault tolerance Pipeline structure Chip multiprocessor Superscalar processor 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Reinhardt, S.K., Mukherjee, S.S.: Transient-Fault Detection via Simultaneous Multithreading. In: The Proceedings of the 27th Annual International Symposium on Computer Architecture (ISCA 2000), Canada, pp. 25–36 (June 2000)Google Scholar
  2. 2.
    Gibson, D., Wood, D.A.: Forward flow: a Scalable Core for Power-Constrained CMPs. In: The Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA 2010), USA, pp. 1–12 (June 2010)Google Scholar
  3. 3.
    Bhattacharjee, A., Martonosi, M.: Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors. In: Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA 2009), USA, pp. 290–301 (June 2009)Google Scholar
  4. 4.
    Sanchez, D., Aragon, J.L., Garcia, J.M.: Extending SRT for Parallel Applications in Tiled-CMP Architecture. In: The Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing (IPDPS 2009), USA, pp. 1–8 (July 2009)Google Scholar
  5. 5.
    Prvulovic, M., Zhang, Z., Torrellas, J.: ReVive: Cost-Effective Architectural Support for Rollback Recovery in Shared-Memory Multiprocessors. In: The Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA 2002), USA, pp. 111–122 (May 2002)Google Scholar
  6. 6.
    Aggrarwal, N., Smiths, J.E., Saluja, K.K., Jouppi, N.P., Ranganathan, P.: Implementing High Availability Memory with a Duplication Cache. In: The Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture (MICRO 2008), USA, pp. 71–82 (November 2008)Google Scholar
  7. 7.
    Zarandi, H.R., Miremadi, S.G.: A Highly Fault Detectable Cache Architecture for Dependable Computing. In: Heisel, M., Liggesmeyer, P., Wittmann, S. (eds.) SAFECOMP 2004. LNCS, vol. 3219, pp. 45–59. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  8. 8.
    Vadlamani, R., Zhao, J., Burleson, W., Tessier, R.: Multicore Soft Error Rate Stabilization Using Adaptive Dual Modular Redundancy. In: The Proceedings of the Conference on Design, Automation and Test in Europe (DATE 2010), Germany, pp. 27–32 (March 2010)Google Scholar
  9. 9.
    Kumar, S., Hari, S., Li, M., Ramachandran, P., Choi, B., Adve, S.V.: mSWAT: Low-Cost Hardware Fault Detection and Diagnosis for Multicore Systems. In: The Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2009), USA, pp. 122–132 (December 2009)Google Scholar
  10. 10.
    Siegel, T.J., et al.: IBM’s S/390 G5 Microprocessor Design. IEEE Micro 19(2), 12–23 (1999)CrossRefGoogle Scholar
  11. 11.
    Compaq Computer Corporation, Data Integrity for Compaq Nonstop Himalaya Servers (1999), http://nonstop.compaq.com
  12. 12.
    Bower, F.A., Sorin, D.J., Ozev, S.: Online Diagnosis of Hard Faults in Microprocessors. ACM Transactions on Architecture and Code Optimization (TACO) 4(2), article 8 (June 2007)Google Scholar
  13. 13.
    Srinivasan, J., Adve, S.V., Bose, P., Rivers, J.A.: Exploiting Structural Duplication for Lifetime Reliability Enhancement. In: The Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA 2005), USA, pp. 520–531 (June 2005)Google Scholar
  14. 14.
    Tullsen, D.M., et al.: Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor. In: The Proceedings of the 23rd Annual International Symposium on Computer Architecture (ISCA 1996), USA, pp. 191–202 (June 1996)Google Scholar
  15. 15.
    Eyerman, S., Eeckhout, L.: Probabilistic Job Symbiosis Modeling for SMT Processor Scheduling. In: The Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2010), USA, pp. 91–102 (March 2010)Google Scholar
  16. 16.
    Ramirez, T., Pajuelo, A., Santana, O.J., Valero, M.: Run ahead Threads to Improve SMT Performance. In: The Proceedings of the 14th International Symposium on High Performance Computer Architecture (HPCA 2008), UT, pp. 149–158 (February 2008)Google Scholar
  17. 17.
    Eyerman, S., Eeckhout, L.: Per-Thread Cycle Accounting. IEEE Micro 30(1), 71–80 (2010)CrossRefGoogle Scholar
  18. 18.
    Timor, A., Mendelson, A., Birk, Y., Suri, N.: Using Underutilize CPU Resources to Enhance Its Reliability. IEEE Transactions on Dependable and Secure Computing 7(1), 94–109 (2010)CrossRefGoogle Scholar
  19. 19.
    Gomaa, M.A., Vijaykumar, T.N.: Opportunistic Transient-Fault Detection. In: The Proceedings of the 32nd International Symposium on Computer Architecture (ISCA 2005), pp. 172–183 (June 2005)Google Scholar
  20. 20.
    Sato, T.: Exploiting Instruction Redundancy for Transient Fault Tolerance. In: The Proceedings of the 18th International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT 2003), USA, pp. 547–555 (November 2003)Google Scholar
  21. 21.
    Wells, P.M., Chakraborty, K., Sohi, G.S.: Mixed-Mode Multicore Reliability. In: The Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2009), USA, pp. 169–180 (March 2009)Google Scholar
  22. 22.
    Rotenburg, E.: AR-SMT a Microarchitectural Approach to Fault Tolerance in Microprocessors. In: The Proceedings of 29th Annual International Symposium on Fault-Tolerant Computing Systems (FTCS 1999), USA, pp. 84–91 (June 1999)Google Scholar
  23. 23.
    Vijaykumar, T.N., Pomeranz, I., Cheng, K.: Transient-Fault Recovery Using Simultaneous Multithreading. In: The Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA 2002), USA, pp. 87–98 (May 2002)Google Scholar
  24. 24.
    Mukherjee, S.S., Kontz, M., Reinhardt, S.K.: Detailed Design and Evaluation of Redundant Multithreading Alternatives. In: The Proceedings of the 29th Annual International Symposium on Computer Architecture (ISCA 2002), USA, pp. 99–110 (May 2002)Google Scholar
  25. 25.
    Aggarwal, N., Ranganathan, P., Jouppi, N.P., Smith, J.E.: Configurable Isolation: Building High Availability Systems with Commodity Multi-Core Processors. In: The Proceedings of the 34th International Symposium on Computer Architecture (ISCA 2007), USA, pp. 340–347 (June 2007)Google Scholar
  26. 26.
    Ragel, R., Ambrose, A., Peddersen, J., Parameswaran, S.: RACE: A Rapid, Architectural Simulation and Synthesis Framework for Embedded Processors. In: Hinchey, M., Kleinjohann, B., Kleinjohann, L., Lindsay, P.A., Rammig, F.J., Timmis, J., Wolf, M. (eds.) DIPES 2010. IFIP AICT, vol. 329, pp. 137–144. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  27. 27.
    Burger, D.A., Austin, T.M.: The SimpleScalar Tool Set, Version 2.0. Technical report #1342, University of Wisconsin-Madison, Computer Science Department (June 1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Hananeh Aliee
    • 1
  • Hamid Reza Zarandi
    • 1
  1. 1.Department of Computer Engineering and Information TechnologyAmirkabir University of Technology (Tehran Polytechnic)Ireland

Personalised recommendations