Abstract
Future many-core processors probably contain more than 1000 cores on a single die. But continued scaling of silicon fabrication technology make such chips orders of magnitude more vulnerable to errors. This means reliability techniques have to be an essential part of many-core processors. Redundant execution is a efficient solution to improve reliability. Present redundant execution mechanisms such an SRT,CRT,DIVA and RECVF aim to improve performance decrease using execution assistance and other speculative mechanisms. We are from another way that utilizing idle cores in may-core processors to execute redundancy. We propose core-level dynamic redundancy (CDR) which includes the following unique properties : i) eliminates restriction of hardware and supports redundancy on arbitrary core. ii) dynamically chooses core to execute redundancy on cores conditions, so effectively balance reliability, performance and power. Experimental results show the effectiveness of the pro-posed techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Borkar, S.: Thousand core chips: a technology perspective. In: Proceedings of the 44th annual Design Automation Conference (June 2007)
Srinivasan, J., Adve, S.V., Bose, P., Rivers, J.A.: The impact of technology scaling on lifetime reliability. In: Intl. Conf. on Dependable Systems and Networks (June 2004)
Reinhardt, S.K., Mukherjee, S.S.: Transient fault detection via simulta-neous multithreading. In: Intl. Symp. on Computer Architecture (June 2000)
Mukherjee, S.S., Kontz, M., Reinhardt, S.K.: Detailed design and evaluation of redundant multithreading alternatives. In: Intl. Symp. on Computer Architecture (May 2002)
Austin, T.: DIVA: A Reliable Substrate For Deep Submicron Microarchitecture Design. In: Proceedings of the 32nd MICRO, pp. 196–207 (1999)
Subramanyan, P., Singh, V., Saluja, K.K., Larsson, E.: Energy-Efficient Fault Tolerance in Chip Multiprocessors Using Critical Value Forwarding. In: Intl. Conf. on Dependable Systems and Networks (June 2010)
Rotenberg, E.: AR-SMT: A microarchitectural approach to fault tolerance in microprocessors. In: Intl. Symp. on Fault-Tolerant Computing (June 1999)
Vijaykumar, T.N., Pomeranz, I., Cheng, K.: Transient-fault recovery using simultaneous multithreading. In: Intl. Symp. on Computer Architecture (May 2002)
Gomaa, M., Scarbrough, C., Vijaykumar, T.N., Pomeranz, I.: Transient-fault recovery for chip multiprocessors. In: Intl. Symp. on Computer Architecture (June 2003)
Smolens, J.C., Gold, B.T., Falsafi, B., Hoe, J.C.: Reunion: Complexity-effective multicore redundancy. In: Intl. Symp. on Microarchitecture (December 2006)
Nomura, S., Sinclair, M.D., Ho, C., Govindaraju, V., de Krujif, M., Sankaralingam, K.: Sampling + DMR: Practical and Low-overhead Permanent Fault Detection. In: Intl. Symp. on Computer Architecture (June 2011)
Smolens, A.C., Gold, B.T., Kim, J., Falsafi, B., Hoe, J.C., Nowatzyk, A.G.: Fingerprinting: bounding soft-error detection latency and bandwidth. In: Intl. Conf. on ASPLOS (October 2004)
Greskamp, B., Torrellas, J.: Paceline: Improving Single-Thread Performance in Nanoscale CMPs through Core Overclocking. In: Proceedings of the 16th PACT (September 2007)
Subramanyan, Singh, V., Saluja, K.K., Larsson, E.: Mulitplexed Redundant Execution: A Technique for Efficient Fault Tolerance in Chip Multiprocessors. In: Proc. of DATE (2010)
Zhang, L., Han, Y., Xu, Q., Li, X.: Defect tolerance in homogeneous manycore processors using core-level redundancy with unified topology. In: Proc. Design, Automation and Test in Europe, DATE 2008, pp. 891–896 (2008)
Wentzlaff, D., Griffin, P., Hoffmann, H., Bao, L.W., Edwards, B., Ramey, C., Mattina, M., Miao, C.C., Brown, J.F., Agarwal, A.: On-chip interconnection architecture of the tile pro-cessor. IEEE Micro 27(5), 15–31 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jia, W., Zhang, C., Fu, J., Li, R. (2013). Architecting Dependable Many-Core Processors Using Core-Level Dynamic Redundancy. In: Yuan, Y., Wu, X., Lu, Y. (eds) Trustworthy Computing and Services. ISCTCS 2012. Communications in Computer and Information Science, vol 320. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35795-4_86
Download citation
DOI: https://doi.org/10.1007/978-3-642-35795-4_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-35794-7
Online ISBN: 978-3-642-35795-4
eBook Packages: Computer ScienceComputer Science (R0)