Foundations of Dependable Computing pp 71-86 | Cite as
Network Fault Detection and Recovery in the Chaos Router
- 37 Downloads
Abstract
Chaotic routing, which allows packets to follow non-minimal routes, provides a basic level of fault-tolerance by allowing messages to be routed around faults without requiring a priori knowledge of their locations. However, the mechanisms for doing this can be slow and clumsy at times. We augment Chaotic routing with a limited amount of hardware to support fault, detection, identification, and reconfiguration so that the network can automatically reconfigure itself when faults occur. We present a high-level design of these mechanisms, driven by the goal of achieving reasonable reliability without exorbitant cost.
Preview
Unable to display preview. Download preview PDF.
References
- [1]Kevin Bolding. Chaotic Routing: Design and Implementation of an Adaptive Multicomputer Network Router. PhD thesis, University of Washington, Seattle, WA, July 1993.Google Scholar
- [2]Kevin Bolding, Sen-Ching Cheung, Sung-Eun Choi, Carl Ebeling, Soha Hassoun, Ton Anh Ngo, and Robert Wille. The chaos router chip: Design and implementation of an adaptive router. In Proceedings of the IFIP Conf. on VLSI, pages 311–320, September 1993.Google Scholar
- [3]Kevin Bolding and Lawrence Snyder, Mesh and torus chaotic routing. In Advanced Research in VLSI and Parallel Systems: Proceedings of the 1992 Brown/MIT Conference, pages 333–347, March 1992.Google Scholar
- [4]A. Borodin and J. E. Hopcroft. Routing, merging and sorting on parallel models of computation. Journal of Computer and System Sciences, 30:130–145, 1985.zbMATHCrossRefGoogle Scholar
- [5]Ming-Syan Chen and Kang G. Shin. Adaptive fault-tolerant routing in hypercube multicomputers. IEEE Trans. on Computers, 39(12):1406–1416, December 1990.CrossRefGoogle Scholar
- [6]Bill Coates, Al Davis, and Ken Stevens. The post office experience: Designing a large asynchronous chip. In Proceedings of the HICSS, 1993.Google Scholar
- [7]Robert Cypher and Luis Gravano. Adaptive, deadlock-free packet routing in torus networks with minimal storage. In Proc. Int. Conf. on Parallel Processing, pages 204–211, 1992.Google Scholar
- [8]W. Dally. Wire-efficient VLSI multiprocessor communication networks. In Paul Losleben, editor, Proceedings of the Stanford Conference on Advanced Research in VLSI, pages 391–415. MIT Press, March 1987.Google Scholar
- [9]W. Dally and C. Seitz. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. on Computers, C-36(5):547–553, May 1987.Google Scholar
- [10]Chien Fang and Ted Szymanski. An analysis of deflection routing in multi-dimensional regular mesh networks. In Proceedings of IEEE INFOCOM’ 91, pages 859–868, April 1991.Google Scholar
- [11]C. Flaig. VLSI mesh routing systems. Master’s thesis, California Institute of Technology, May 1987.Google Scholar
- [12]Melanie L. Fulgham and Lawrence Snyder. Performance of chaos and oblivious routers under non-uniform traffic. Technical Report CSE-93-06-01, University of Washington, Seattle, WA, June 1993.Google Scholar
- [13]Christopher J. Glass and Lionel M. Ni. The turn model for adaptive routing. In Proc. Int. Symp. on Computer Architecture, 1992.Google Scholar
- [14]P. Kermani and L. Kleinrock. Virtual cut-through: A new computer communication switching technique. Computer Networks, 3:267–286, 1979.zbMATHGoogle Scholar
- [15]Smaragda Konstantinidou and Lawrence Snyder. The chaos router: A practical application of randomization in network routing. In Proc. Symp. on Parallel Algorithms and Architectures, pages 21–30, 1990.Google Scholar
- [16]D. H. Linder and J. C. Hardin. An adaptive and fault tolerant wormhole routing strategy for k-ary n-cubes. IEEE Trans. on Computers, C-40(1):2–12, January 1991.CrossRefGoogle Scholar
- [17]Neil McKenzie, Kevin Bolding, Carl Ebeling, and Lawrence Snyder. CRANIUM: An interface for message passing on adaptive packet routing networks. In Proc. Parallel Computer Routing and Communication Workshop, May 1994.Google Scholar
- [18]J. Y. Ngai and C. L. Seitz. A framework for adaptive routing in multicomputer networks. In Proc. Symp. on Parallel Algorithms and Architectures, pages 1–9, 1989.Google Scholar
- [19]Gustavo D. Pifarré, Luis Gravano, Sergio A. Felperin, and Jorge L. C. Sanz. Fully-adaptive minimal deadlock-free packet routing in hypercubes, meshes and other networks. In Proc. Symp. on Parallel Algorithms and Architectures, pages 278–290, 1991.Google Scholar
- [20]Charles L. Seitz and Wen-King Su. A family of routing and communication chips based on the Mosaic. In Symp. on Integrated Systems: Proc. of the 1993 Washington Conf., pages 320–337, 1993.Google Scholar
- [21]B. J. Smith. Architecture and applications of the HEP multiprocessor computer system. In Proceedings of SPIE, pages 241–248, 1981.Google Scholar