Abstract
This paper describes different schemes for tolerating faults in hypercube multiprocessors. A study of hypercube algorithms reveals that in many cases, the computations that require local communication are mapped onto topologies such as meshes or rings and the hypercube topology is used for global data communication. Therefore, a faulty hypercube needs to be reconfigured to perform both local and global communication as required by the algorithm, effectively and with minimal performance degradation. Two general approaches can be identified. The first approach looks into ways of utilizing the healthy processors and links of a hypercube with faulty nodes/links, for embedding topologies such as lower dimensional hypercubes, rings, meshes and trees for performing communication. The second approach makes use of hardware redundancy in the form of spare nodes and/or links and usually requires modifications in the communication hardware. Augmented hypercubes and spare allocation schemes are described.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
E. Dilger and E. Amman, “System level self diagnosis in n-cube connected multiprocessor networks,” in Proc. 14th Int. Symp. on Fault Tolerant Computing, pp. 184–189, 1984.
C. Aykanat and F. Özgüner, “A concurrent error detecting conjugate gradient algorithm on a hypercube multiprocessor,” in IEEE 17th International Symposium on Fault Tolerant Computing, pp. 204–209, July 1987.
C. Aykanat, F. Özgüner, P. Sadayappan, and F. Ercal, “Iterative algorithms for solution of large sparse systems of linear equations on hypercubes,” IEEE Transactions on Computers, vol. c-37, pp. 1554–1568, December 1988.
F. Özgüner and C. Aykanat, “A reconfiguration algorithm for fault tolerance in a hypercube multiprocessor,” Information Processing Letters, vol. 29, pp. 247–254, November 1988.
B. Becker and H.U. Simon, “How robust is the n-cube?,” in Proc. 27th Annu. Symp. Foundations Comput. Sci., pp. 283–291, October 1986.
C. C. Li and W. K. Fuchs, “Graceful degradation on hypercube multiprocessors using data redistribution,” in Proceedings of the Fifth Conference on Hypercube Concurrent Computers and Applications, pp. 1446–1454, April 1990.
P. Banerjee, “Reconfiguring a hypercube multiprocessor in the presence of faults,” in Proceedings of the Fifth Conference on Hypercube Concurrent Computers and Applications, pp. 95–102,1990.
D. Rennels, “On implementing fault-tolerance binary hypercubes,” Proceedings of the IEEE International Symposium on Fault Tolerant Computing, pp. 344–349, 1986.
S. C. Chau and A. L. Liestman, “A proposal for a fault-tolerant binary hypercubes architecture,” Proceedings of the IEEE International Symposium on Fault Tolerant Computing, pp. 323–330, 1989.
P. Banerjee, “Strategies for reconfiguring hypercubes under faults,” Proceedings of the IEEE International Symposium on Fault Tolerant Computing, pp. 210–217, 1990.
A. Witkowski and R. Lee, “Fault tolerance for the hypercube multiprocessor,” in Proceedings of the Fifth Conference on Hypercube Concurrent Computers and Applications, pp. 117–122, 1990.
P. Banerjee, J. Rahmen, C. Stunkel, V. Nair, K. Roy, V. Balasubramanian, and J. Abraham, “Algorithm-based fault tolerance on a hypercube multiprocessor,” IEEE Transactions on Computers, vol. 39, pp. 1132–1145, September 1990.
M. Alam and R. Melhem, “An efficient modular allocation scheme and its application to fault tolerant binary hypercubes,” in IEEE Transactions on Parallel and Distributed Systems, vol. 2, pp. 117–126, January 1991.
M. Alam and R. Melhem, “Channel multiplexing in modular fault tolerant multiprocessor,” in Proceedings of the IEEE International Conference on Parallel Processing, 1991.
B. Izadi and F. Özgüner, “Spare allocation and reconfiguration in fault tolerant hypercube with direct connect capability,” in Proceedings of the Sixth Conference on Distributed Memory Computing Conference, pp. 711–714, April 1991.
F. Harary, J. P. Hayes, and H. J. Wu, “A survey of the theory of hypercube graphs,” in Computers and Mathematics with Applications, vol. 15, pp. 277–289, 1988.
A. Wu, “Embedding of tree networks into hypercubes,” in Journal of parallel and distributed computing, vol. 2, pp. 238–249, April 1985.
Y. Saad and M. H. Schultz, “Topological properties of hypercubes,” IEEE Transactions on Computers, vol. c-37, pp. 867–872, July 1988.
S. K. Chen, C. T. Liang, and W. T. Tsai, “An efficient multi-dimensional grids reconfiguration algorithm on hypercubes,” Proc. of 18th Fault Tolerant Computing, pp. 368–373, June 1988.
T. C. Lee, “Quick recovery of embedded structures in hypercube computers,” in Proceedings of the Fifth Conference on Hypercube Concurrent Computers and Applications, pp. 1426–1435, April 1990.
S. R. Deshpande and R. M. Jenevein, “Scalability of a binary tree on a hypercube,” in Proceedings of the IEEE International Conference on Parallel Processing, pp. 661–668, 1986.
F. J. Provost and R. Melhem, “A distributed algorithm for embedding trees in hypercubes with modifications for run-time fault tolerance,” in Journal of parallel and distributed computing, vol. 14, pp. 85–89, February 1992.
M. Y. Chan and S. J. Lee, “Distributed fault-tolerant embeddings of rings in hypercubes,” in Proceedings of the Fifth Conference on Hypercube Concurrent Computers and Applications, pp. 834–838, April 1990.
J. Wang and F. Özgüner, “Embeddings, communication and performance of algorithms in faulty hypercubes,” in Proceedings of the Fifth Conference on Hypercube Concurrent Computers and Applications, pp. 1455–1464, 1990.
S. Nugent, “The iPSC/2 direct-connect communication technology,” in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, pp. 51–60, January 1988.
H. Sullivan, T. Bashkow, and D. Klappholz, “A large scale, homogeneous, fully distributed parallel machine,” in Proceedings of the 15th Annual International Symposium on Computer Architecture, pp. 105–124, March 1977.
T. C. Lee and J. P. Hayes, “Routing and broadcasting in faulty hypercube computers,” in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, pp. 346–354, January 1988.
M. Y. Chan and S. J. Lee, “Fault-tolerant embeddings of complete binary trees and rings in hypercubes,” in Technical Report UTDCS-17-89 University of Texas at Dallas, August 1989.
F. S. Roberts, Applied Combinatorics. Englewood Cliffs, NJ: Prentice-Hall, 1984.
Y. Saad and M. H. Schulz, “Data communication in hypercubes,” in Tech. Report YALEU/DCS/RR-389, Dept. of Computer Science, June 1985.
S. L. Johnsson and C. T. Ho, “Optimum broadcasting and personalized communication in hypercubes,” IEEE Transactions on Computers, vol. C-39, pp. 1249–1268, September 1989.
J. Bruck, “Optimal broadcasting in faulty hypercubes via edge-disjoint embeddings,” in Tech. Report RJ 7174(67394), IBM, Computer Science, November 1989.
S. Balakrishnan and F. Özgüner, “An n-step global sum/global broadcast algorithm for n-dimensional faulty hypercubes,” in Tech. Report Dept. of Electrical Engineering The Ohio State University, January 1992.
M. Alam and R. Melhem, “Fault tolerance and reliable routing in augmented hypercube architectures,” in IEEE 18th Annual Phoenix Int. Conf. on Computer Communication Proceeding, pp. 19–23, 1989.
B. Izadi and F. Özgüner, “Fault tolerance and reconfigurability of an augmented hypercube with direct connect routing,” in Tech. Report Dept. of Electrical Engineering The Ohio State University, November 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Balakrishnan, S., Özgüner, F., Izadi, B. (1993). Fault Tolerance in Hypercubes. In: Özgüner, F., Erçal, F. (eds) Parallel Computing on Distributed Memory Multiprocessors. NATO ASI Series, vol 103. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-58066-6_14
Download citation
DOI: https://doi.org/10.1007/978-3-642-58066-6_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-63460-4
Online ISBN: 978-3-642-58066-6
eBook Packages: Springer Book Archive