Abstract
As semiconductor processes scale to smaller and smaller feature sizes, manufacturing reliable digital designs is challenging how systems are traditionally designed. Specifically, the shrinking of transistor and wire size imposes that these components simultaneously are becoming more prone to complete, or parametric, failure at manufacturing time. Additionally, the derived systems are increasingly expensive to produce and less likely to function correctly for as long as intended. In order to address these challenges, the NoC-based systems have to be designed with reliability and fault tolerance features in mind. Toward this goal, a number of design techniques and methodologies are available that promise to provide sufficient fault coverage with controllable overhead in terms of hardware redundancy and performance (e.g., delay/power) degradation. This chapter studies the origin of faults in modern technologies and explains the classification to transient, intermittent, and permanent faults. A survey of fault tolerance methods is presented to demonstrate the diversity of available methods. Fault tolerance methods for NoCs are studied at different layers of the OSI reference model.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
J. Abraham, A. Krishnamachary, R. Tupuri, A comprehensive fault model for deep submicron digital circuits, in Internationl Workshop on Electronic Design, Test and Applications, 2002, pp. 360–364
R. Aitken, Nanometer technology effects on fault models for IC testing. IEEE Comput. 32(11), 46–51 (1999)
P. Aldworth, System-on-a-chip bus architecture for embedded applications, in International Conference on, Computer Design, 1999, pp. 297–298
W. Bainbridge, S. Furber, Delay insensitive system-on-chip interconnect using 1-of-4 data encoding, in International Symposium on Asynchronus Circuits and Systems, 2001, pp. 118–126
R. Baumann, Soft errors in advanced computer systems. IEEE Des. Test Comput. 22(3), 258–266 (2005)
R. Blahut, Algebraic Codes for Data Transmission (Cambridge University Press, Cambridge, 2003)
S. Borkar, Designing reliable systems from unreliable components: the challenges of transistor variability and degradation. IEEE Micro 25(6), 10–16 (2005)
D. Brooks, M. Martonosi, Dynamic thermal management for high-performance microprocessors, in International Symposium on High-Performance Computer, Architecture, 2001, pp. 171–182
R. Casado, A. Bermudez, J. Duato, F. Quiles, J. Sanchez, A protocol for deadlock-free dynamic reconfiguration in high-speed local area networks. IEEE Trans. Parallel Distrib. Syst. 12(2), 115–132 (2001)
L. Chanhee, K. Hokeun Kim, P. Hae-woo, K. Sungchan, O. Hyunok Oh, H. Soonhoi, A task remapping technique for reliable multi-core embedded systems, in International Conference on Hardware/Software Codesign and System, Synthesis, 2010, pp. 307–316
L.H. Chee, W, Daasch, G. Cai, A thermal-aware superscalar microprocessor, in International Symposium on Quality, Electronic Design, 2002, pp. 517–522
Y. Chengmo, A. Orailoglu, Predictable execution adaptivity through embedding dynamic reconfigurability into static MPSoC schedules, in International Conference on Hardware/Software Codesign and System, Synthesis, 2007, pp. 15–20
L. Cherkasova, V. Kotov, T. Rokicki, Fibre channel fabrics: evaluation and design, in International Conference on System Sciences, 1996, pp. 53–62
J. Cong, Z. Yan Zhang, Thermal via planning for 3-D ICs, in International Conference on, Computer-Aided Design, 2005, pp. 745–752
C. Constantinescu, Trends and challenges in VLSI circuit reliability. IEEE Micro 23(4), 14–19 (2003)
B. Cordan, An efficient bus architecture for system-on-chip design, in Custom Integrated Circuits, 1999, pp. 623–626
O. Derin, D. Kabakci, L. Fiorin, Online task remapping strategies for fault-tolerant Network-on-Chip multiprocessors, in International Symposium on Networks on Chip, 2011, pp. 129–136
A. Dogan, F. Ozguner, Matching and scheduling algorithms for minimizing execution time and failure probability of applications in heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 308–323 (2002)
J. Flich, P. Lopez, C. Sancho, A. Robles, J. Duato, Improving infiniBand routing through multiple virtual networks, in International Symposium on High, Performance Computing, 2002, pp. 49–63
M. Gomez, J. Duato, J. Flich, P. Lopez, A. Robles, N. Nordbotten, O. Lysne, T. Skeie, An efficient fault-tolerant routing methodology for meshes and tori. Comput. Archit. Lett. 3(1), 3 (2004)
C. Grecu, A. Ivanov, R. Saleh, E. Sogomonyan, P. Partha Pratim, On-line fault detection and location for NoC interconnects, in International On-Line Testing, Symposium, 2006, p. 6
P. Guerrier, A. Greiner, A generic architecture for on-chip packet-switched interconnections, in Design, Automation and Test in Europe Conference and Exhibition, 2000, pp. 250–256
C. Ho, L. Stockmeyer, A new approach to fault-tolerant wormhole routing for mesh-connected parallel computers. IEEE Trans. Comput. 53(4), 427–438 (2004)
Z. Hui, W. Marlene, V. George, J. Rabaey, Interconnect architecture exploration for low-energy reconfigurable single-chip DSPs, in IEEE Workshop On VLSI, 1999, pp. 2–8
W. Hung, C. Addo-Quaye, T. Theocharides, Y. Xie, N. Vijakrishnan, M. Irwin, Thermal-aware IP virtualization and placement for networks-on-chip architecture, in International Conference on, Computer Design, 2004, pp. 430–437
International Standards Organization, Open Systems Interconnection (OSI) Standard 35.100 (available at http://www.iso.org)
ITRS 2011 (availabe at http://www.itrs.net)
A. Iwata, M. Sasaki, T. Kikkawa, S. Kameda, H. Ando, K. Kimoto, D. Arizono, H. Sunami, A 3D integration scheme utilizing wireless interconnections for implementing hyper brains, in International Solid-State Circuits Conference, 2005, pp. 262–597
V. Izosimov, P. Pop, P. Eles, Z. Peng, Design optimization of time- and cost-constrained fault-tolerant distributed embedded systems, in Design, Automation and Test in, Europe, 2005, pp. 864–869
S. Jayanth, S. Adve, B. Pradip, J. Rivers, Lifetime reliability: toward an architectural solution. IEEE Micro 25(3), 70–80 (2005)
H. Jia, J. Blech, A. Raabe, C. Buckl, A. Knoll, Analysis and optimization of fault-tolerant task scheduling on multiprocessor embedded systems, in International Conference on Hardware/Software Codesign and System, Synthesis, 2011, pp. 247–256
B. Johnson, Design and Analysis of Fault-Tolerant Digital Systems (Addison-Wesley, MA, 1989)
M. Koibuchi, A. Funahashi, A. Jouraku, H. Amano, L-turn routing: an adaptive routing in irregular networks, in International Conference on Parallel Processing, 2001, pp. 383–392
M. Koibuchi, A. Jouraku, K. Watanabe, H. Amano, Descending layers routing: a deadlock-free deterministic routing using virtual channels in system area networks with irregular topologies, in International Conference on Parallel Processing, 2003, pp. 527–536
I. Koren, C. Krishna, Fault-tolerant systems (Morgan Kaufmann, CA, 2007)
P. Lala, Self-Checking and Fault-Tolerant Digital Design (Morgan Kaufmann Publishers, CA, 2001)
H. Lin, Y. Feng, X. Qiang, Lifetime reliability-aware task allocation and scheduling for MPSoC platforms, in Design, Automation and Test in Europe Conference and Exhibition, 2009, pp. 51–56
L. Lin, N. Vijaykrishnan, M. Kandemir, M. Irwin, Adaptive error protection for energy efficiency, in International Conference on, Computer Aided Design, 2003, pp. 2–7
A. Maheshwari, W. Burleson, R. Tessier, Trading off transient fault tolerance and power consumption in deep submicron VLSI circuits. IEEE Trans. Very Large Scale Integr. VLSI Syst. 12(3), 299–311 (2004)
S. Murali, M. Coenen, A. Radulescu, K. Goossens, G. De Micheli, A methodology for mapping multiple use-cases onto networks on chips, in Design, Automation and Test in, Europe, 2006, pp. 1–6
S. Murali, T. Theocharides, N. Vijaykrishnan, M. Irwin, L. Benini, G. De Micheli, Analysis of error recovery schemes for networks on chips. IEEE Des. Test Comput. 22(5), 434–442 (2005)
E. Nilsson, J. Oberg, PANACEA - a case study on the PANACEA NoC - a nostrum network on chip prototype, Royal Institute of Technology, Tech. Report. 229, 2006
P. Partha Pratim, C. Grecu, M. Jones, A. Ivanov, R. Saleh, Performance evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE Trans. Comput. 54(8), 1025–1040 (2005)
C. Patel, S. Chai, S. Yalamanchili, D. Schimmel, Power constrained design of multiprocessor interconnection networks, in International Conference on, Computer Design, 1997, pp. 408–416
M. Pirretti, G. Link, R. Brooks, N. Vijaykrishnan, M. Kandemir, M. Irwin, Fault tolerant algorithms for network-on-chip interconnect, in IEEE Annual Symposium on VLSI, 2004, pp. 46–51
V. Puente, R. Beivide, J. Gregorio, J. Prellezo, J. Duato, C. Izu, Adaptive bubble router: a design to improve performance in torus networks, in International Conference on Parallel Processing, 1999, pp. 58–67
J.M. Rabaey, Low Power Design Essentials, Series on Integrated Circuits and Systems (Springer, New York, 2009)
F. Ridruejo, J. Miguel-Alonso, INSEE: an interconnection network simulation and evaluation environment. Euro-Par Parallel Process. 3648, 1014–1023 (2005)
J. Sancho, A. Robles, J. Flich, P. Lopez, J. Duato, Effective methodology for deadlock-free minimal routing in infiniBand networks, in International Conference on Parallel Processing, 2002, pp. 409–418
M. Schroeder, A. Birrell, M. Burrows, H. Murray, R. Needham, T. Rodeheffer, E. Satterthwaite, C. Thacker, Autonet: a high-speed, self-configuring local area network using point-to-point links. IEEE J. Sel. Areas Commun. 9(8), 1318–1335 (1991)
R. Seifert, Gigabit Ethernet (Addison-Wesley, MA, 1998). ISBN 0-201-18553-9
N. Shanbhag, A mathematical basis for power-reduction in digital VLSI systems. IEEE Trans. Circuits Syst. II Analog Digital SSignal Proc. 44(11), 935–951 (1997)
T. Simunic, S. Boyd, P. Glynn, Managing power consumption in networks on chips. IEEE Trans. Very Large Scale Integr. VLSI Syst. 12(1), 96–107 (2004)
K. Skadron, M. Stan, W. Huang, V. Sivakumar, S. Karthik, D. Tarjan, Temperature-aware microarchitecture, in International Symposium on Computer, Architecture, 2003, pp. 2–13
T. Skeie, O. Lysne, I. Theiss, Layered shortest path (LASH) routing in irregular system area networks, in International Symposium on Parallel and Distributed Processing, Symposium, 2002, pp. 162–169
J. Smolens, B. Gold, J. Hoe, B. Falsafi, K. Mai, Detecting emerging wearout faults, in Workshop on Silicon Errors in Logic - System Effects, 2007
T. Streichert, C. Strengert, C. Haubelt, J. Teich, Dynamic task binding for hardware/software reconfigurable networks, in Symposium on Integrated circuits and systems design, 2006, pp. 38–43
I. Sungjun, K. Banerjee, Full chip thermal analysis of planar (2-D) and vertically integrated (3-D) high performance ICs, in International Electron Devices Meeting, 2000, pp. 727–730
S. Tosun, N. Mansouri, E. Arvas, M. Kandemir, X. Yuan, Reliability-centric high-level synthesis, in Design, Automation and Test in, Europe, 2005, pp. 1258–1263
J. Walrand, P. Varaiya, High-Performance Communication Networks (Morgan Kaufman, CA, 2000)
H. Wei, K. Sankaranarayanan, K. Skadron, R. Ribando, M. Stan, Accurate, pre-RTL temperature-aware design using a parameterized, geometric thermal model. IEEE Trans. Comput. 57(9), 1277–1288 (2008)
R. Wells, Applied Coding and Information Theory for Engineers (Prentice Hall, Inc., NJ, 1999)
S. Winegarden, A bus architecture centric configurable processor system, in Custom Integrated Circuits, 1999, pp. 627–630
D. Wingard, MicroNetwork-based integration for SOCs, in Design Automation Conference, 2001, pp. 673–677
F. Worm, P. Ienne, P. Thiran, G. De Micheli, A robust self-calibrating transmission scheme for on-chip networks. IEEE Trans. Very Large Scale Integr. VLSI Syst. 13(1), 126–139 (2005)
Y. Xie, L. Li, M. Kandemir, N. Vijaykrishnan, M. Irwin, Reliability-aware co-synthesis for embedded systems, in International Conference on Application-Specific Systems, Architectures and Processors, 2004, pp. 41–50
T. Ye, L. Benini, G. De Micheli, Analysis of power consumption on switch fabrics in network routers, in Design Automation Conference, 2002, pp. 524–529
R. Yoshimura, K. Tan Boon, T. Ogawa, S. Hatanaka, T. Matsuoka, K. Taniguchi, DS-CDMA wired bus with simple interconnection topology for parallel processing system LSIs, in International Solid-State Circuits Conference, 2000, pp. 370–371
Z. Yuping, H. Zimian, X. Xianbin, Z. Wuqing, W. Zhuowei, Workload-balancing schedule with adaptive architecture of MPSoCs for fault tolerance, in International Conference on Biomedical Engineering and Informatics, 2010, pp. 2775–2779
H. Zhang, V. George, J. Rabaey, Low-swing on-chip signaling techniques: effectiveness and robustness. IEEE Trans. Very Large Scale Integr. VLSI Syst. 8(3), 264–272 (2000)
H. Zimmer, A. Jantsch, A fault model notation and error-control scheme for switch-to-switch buses in a network-on-chip, in InternationAl Conference on Hardware/Software Codesign and System, Synthesis, 2003, pp. 188–193
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Tatas, K., Siozios, K., Soudris, D., Jantsch, A. (2014). Power and Thermal Effects and Management. In: Designing 2D and 3D Network-on-Chip Architectures. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-4274-5_4
Download citation
DOI: https://doi.org/10.1007/978-1-4614-4274-5_4
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-4273-8
Online ISBN: 978-1-4614-4274-5
eBook Packages: EngineeringEngineering (R0)