Abstract
Increasing microprocessor vulnerability to soft errors induced by neutron and alpha particle strikes prevents aggressive scaling and integration of transistors in future technologies if left unaddressed. Previously proposed instruction-level redundant execution, as a means of detecting errors, suffers from a severe performance loss due to the resource shortage caused by the large number of redundant instructions injected into the superscalar core. In this paper, we propose to apply three architectural enhancements, namely 1) floating-point unit sharing (FUS), 2) prioritizing primary instructions (PRI), and 3) early retiring of redundant instructions (ERT), that enable transient-fault detecting redundant execution in superscalar microarchitectures with a much smaller performance penalty, while maintaining the original full coverage of soft errors. In addition, our enhancements are compatible with many other proposed techniques, allowing for further performance improvement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hp nonstop himalaya, http://nonstop.compaq.com/
Austin, T.: Diva: A reliable substrate for deep submicron microarchitecture design. In: Proc. the 32nd Annual IEEE/ACM International Symposium on Microarchitecture, November 1999, pp. 196–207 (1999)
Burger, D., Austin, T.M.: The simplescalar tool set, version 2.0. Technical Report 1342, Computer Sciences Department, University of Wisconsin (1997)
Gomaa, M., Scarbrough, C., Vijaykumar, T., Pomeranz, I.: Transient-fault recovery for chip multiprocessors. In: Proc. the International Symposium on Computer Architecture, June 2003, pp. 98–109 (2003)
Hinton, G., Sager, D., Upton, M., Boggs, D.,, D.C.: The microarchitecture of the pentium 4 processor. Intel Technical Journal Q1 2001 Issue (February 2001)
Mendelson, A., Suri, N.: Designing high-performance and reliable superscalar architectures: The out of order reliable superscalar (o3rs) approach. In: Proc. of the International Conference on Dependable Systems and Networks (June 2000)
Mukherjee, S.S., Kontz, M., Reinhardt, S.K.: Detailed design and evaluation of redundant multithreading alternatives. In: Proc. the 29th Annual International Symposium on Computer Architecture, May 2002, pp. 99–110 (2002)
Namjoo, M., McCluskey, E.: Watchdog processors and detection of malfunctions at the system level. Technical Report 81-17, CRC (December 1981)
Parashar, A., Gurumurthi, S., Sivasubramaniam, A.: A complexity-effective approach to alu bandwidth enhancement for instruction-level temporal redundancy. In: Proc. the 31st Annual International Symposium on Computer Architecture (June 2004)
Ray, J., Hoe, J., Falsafi, B.: Dual use of superscalar datapath for transient-fault detection and recovery. In: Proc. the 34th Annual IEEE/ACM International Symposium on Microarchitecture, December 2001, pp. 214–224 (2001)
Reinhardt, S., Mukherjee, S.: Transient fault detection via simultaneous multithreading. In: Proc. the 27th Annual International Symposium on Computer Architecture, June 2000, pp. 25–36 (2000)
Rotenberg, E.: Ar-smt: A microarchitectural approach to fault tolerance in microprocessors. In: Proc. the International Symposium on Fault-Tolerant Computing, June 1999, pp. 84–91 (1999)
Sastry, S.S., Palacharla, S., Smith, J.E.: Exploiting idle floating point resources for integer execution. In: Proc. ACM SIGPLAN 1998 Conf. Programming Language Design and Implementation, June 1998, pp. 118–129 (1998)
Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: The 10th International Conference on Architectural Support for Programming Languages and Operating Systems (October 2002)
Shivakumar, P., et al.: Modeling the effect of technology trends on the soft error rate of combinational logic. In: Proc. International Conference on Dependable Systems and Networks, June 2002, pp. 389–398 (2002)
Slegel, T.J., et al.: IBM’s S/390 G5 microprocessor design. IEEE Micro 19(2), 12–23 (1999)
Smolens, J., Kim, J., Hoe, J.C., Falsafi, B.: Efficient resource sharing in concurrent error detecting superscalar microarchitecture. In: ACM/IEEE International Symposium on Microarchitecture (MICRO) (December 2004)
Sundaramoorthy, K., Purser, Z., Rotenburg, E.: Slipstream processors: Improving both performance and fault tolerance. In: Proc. the 9th International Conference on Architectural Support for Programming Languages and Operating systems, pp. 257–268 (2000)
Vijaykumar, T., Pomeranz, I., Cheng, K.: Transient-fault recovery via simultaneous multithreading. In: Proc. the 29th Annual International Symposium on Computer Architecture, May 2002, pp. 87–98 (2002)
Ziegler, J.F., et al.: IBM experiments in soft fails in computer electronics (1978 - 1994). IBM Journal of Research and Development 40(1), 3–18 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hu, J.S., Link, G.M., John, J.K., Wang, S., Ziavras, S.G. (2005). Resource-Driven Optimizations for Transient-Fault Detecting SuperScalar Microarchitectures. In: Srikanthan, T., Xue, J., Chang, CH. (eds) Advances in Computer Systems Architecture. ACSAC 2005. Lecture Notes in Computer Science, vol 3740. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11572961_17
Download citation
DOI: https://doi.org/10.1007/11572961_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29643-0
Online ISBN: 978-3-540-32108-8
eBook Packages: Computer ScienceComputer Science (R0)