Skip to main content

Fault Tolerance in Hypercubes

  • Conference paper
  • 73 Accesses

Part of the book series: NATO ASI Series ((NATO ASI F,volume 103))

Abstract

This paper describes different schemes for tolerating faults in hypercube multiprocessors. A study of hypercube algorithms reveals that in many cases, the computations that require local communication are mapped onto topologies such as meshes or rings and the hypercube topology is used for global data communication. Therefore, a faulty hypercube needs to be reconfigured to perform both local and global communication as required by the algorithm, effectively and with minimal performance degradation. Two general approaches can be identified. The first approach looks into ways of utilizing the healthy processors and links of a hypercube with faulty nodes/links, for embedding topologies such as lower dimensional hypercubes, rings, meshes and trees for performing communication. The second approach makes use of hardware redundancy in the form of spare nodes and/or links and usually requires modifications in the communication hardware. Augmented hypercubes and spare allocation schemes are described.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. Dilger and E. Amman, “System level self diagnosis in n-cube connected multiprocessor networks,” in Proc. 14th Int. Symp. on Fault Tolerant Computing, pp. 184–189, 1984.

    Google Scholar 

  2. C. Aykanat and F. Özgüner, “A concurrent error detecting conjugate gradient algorithm on a hypercube multiprocessor,” in IEEE 17th International Symposium on Fault Tolerant Computing, pp. 204–209, July 1987.

    Google Scholar 

  3. C. Aykanat, F. Özgüner, P. Sadayappan, and F. Ercal, “Iterative algorithms for solution of large sparse systems of linear equations on hypercubes,” IEEE Transactions on Computers, vol. c-37, pp. 1554–1568, December 1988.

    Article  Google Scholar 

  4. F. Özgüner and C. Aykanat, “A reconfiguration algorithm for fault tolerance in a hypercube multiprocessor,” Information Processing Letters, vol. 29, pp. 247–254, November 1988.

    Article  MathSciNet  MATH  Google Scholar 

  5. B. Becker and H.U. Simon, “How robust is the n-cube?,” in Proc. 27th Annu. Symp. Foundations Comput. Sci., pp. 283–291, October 1986.

    Google Scholar 

  6. C. C. Li and W. K. Fuchs, “Graceful degradation on hypercube multiprocessors using data redistribution,” in Proceedings of the Fifth Conference on Hypercube Concurrent Computers and Applications, pp. 1446–1454, April 1990.

    Google Scholar 

  7. P. Banerjee, “Reconfiguring a hypercube multiprocessor in the presence of faults,” in Proceedings of the Fifth Conference on Hypercube Concurrent Computers and Applications, pp. 95–102,1990.

    Google Scholar 

  8. D. Rennels, “On implementing fault-tolerance binary hypercubes,” Proceedings of the IEEE International Symposium on Fault Tolerant Computing, pp. 344–349, 1986.

    Google Scholar 

  9. S. C. Chau and A. L. Liestman, “A proposal for a fault-tolerant binary hypercubes architecture,” Proceedings of the IEEE International Symposium on Fault Tolerant Computing, pp. 323–330, 1989.

    Google Scholar 

  10. P. Banerjee, “Strategies for reconfiguring hypercubes under faults,” Proceedings of the IEEE International Symposium on Fault Tolerant Computing, pp. 210–217, 1990.

    Google Scholar 

  11. A. Witkowski and R. Lee, “Fault tolerance for the hypercube multiprocessor,” in Proceedings of the Fifth Conference on Hypercube Concurrent Computers and Applications, pp. 117–122, 1990.

    Google Scholar 

  12. P. Banerjee, J. Rahmen, C. Stunkel, V. Nair, K. Roy, V. Balasubramanian, and J. Abraham, “Algorithm-based fault tolerance on a hypercube multiprocessor,” IEEE Transactions on Computers, vol. 39, pp. 1132–1145, September 1990.

    Article  Google Scholar 

  13. M. Alam and R. Melhem, “An efficient modular allocation scheme and its application to fault tolerant binary hypercubes,” in IEEE Transactions on Parallel and Distributed Systems, vol. 2, pp. 117–126, January 1991.

    Article  Google Scholar 

  14. M. Alam and R. Melhem, “Channel multiplexing in modular fault tolerant multiprocessor,” in Proceedings of the IEEE International Conference on Parallel Processing, 1991.

    Google Scholar 

  15. B. Izadi and F. Özgüner, “Spare allocation and reconfiguration in fault tolerant hypercube with direct connect capability,” in Proceedings of the Sixth Conference on Distributed Memory Computing Conference, pp. 711–714, April 1991.

    Google Scholar 

  16. F. Harary, J. P. Hayes, and H. J. Wu, “A survey of the theory of hypercube graphs,” in Computers and Mathematics with Applications, vol. 15, pp. 277–289, 1988.

    Article  MathSciNet  MATH  Google Scholar 

  17. A. Wu, “Embedding of tree networks into hypercubes,” in Journal of parallel and distributed computing, vol. 2, pp. 238–249, April 1985.

    Article  Google Scholar 

  18. Y. Saad and M. H. Schultz, “Topological properties of hypercubes,” IEEE Transactions on Computers, vol. c-37, pp. 867–872, July 1988.

    Article  Google Scholar 

  19. S. K. Chen, C. T. Liang, and W. T. Tsai, “An efficient multi-dimensional grids reconfiguration algorithm on hypercubes,” Proc. of 18th Fault Tolerant Computing, pp. 368–373, June 1988.

    Google Scholar 

  20. T. C. Lee, “Quick recovery of embedded structures in hypercube computers,” in Proceedings of the Fifth Conference on Hypercube Concurrent Computers and Applications, pp. 1426–1435, April 1990.

    Google Scholar 

  21. S. R. Deshpande and R. M. Jenevein, “Scalability of a binary tree on a hypercube,” in Proceedings of the IEEE International Conference on Parallel Processing, pp. 661–668, 1986.

    Google Scholar 

  22. F. J. Provost and R. Melhem, “A distributed algorithm for embedding trees in hypercubes with modifications for run-time fault tolerance,” in Journal of parallel and distributed computing, vol. 14, pp. 85–89, February 1992.

    Article  Google Scholar 

  23. M. Y. Chan and S. J. Lee, “Distributed fault-tolerant embeddings of rings in hypercubes,” in Proceedings of the Fifth Conference on Hypercube Concurrent Computers and Applications, pp. 834–838, April 1990.

    Google Scholar 

  24. J. Wang and F. Özgüner, “Embeddings, communication and performance of algorithms in faulty hypercubes,” in Proceedings of the Fifth Conference on Hypercube Concurrent Computers and Applications, pp. 1455–1464, 1990.

    Google Scholar 

  25. S. Nugent, “The iPSC/2 direct-connect communication technology,” in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, pp. 51–60, January 1988.

    Google Scholar 

  26. H. Sullivan, T. Bashkow, and D. Klappholz, “A large scale, homogeneous, fully distributed parallel machine,” in Proceedings of the 15th Annual International Symposium on Computer Architecture, pp. 105–124, March 1977.

    Google Scholar 

  27. T. C. Lee and J. P. Hayes, “Routing and broadcasting in faulty hypercube computers,” in Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, pp. 346–354, January 1988.

    Google Scholar 

  28. M. Y. Chan and S. J. Lee, “Fault-tolerant embeddings of complete binary trees and rings in hypercubes,” in Technical Report UTDCS-17-89 University of Texas at Dallas, August 1989.

    Google Scholar 

  29. F. S. Roberts, Applied Combinatorics. Englewood Cliffs, NJ: Prentice-Hall, 1984.

    MATH  Google Scholar 

  30. Y. Saad and M. H. Schulz, “Data communication in hypercubes,” in Tech. Report YALEU/DCS/RR-389, Dept. of Computer Science, June 1985.

    Google Scholar 

  31. S. L. Johnsson and C. T. Ho, “Optimum broadcasting and personalized communication in hypercubes,” IEEE Transactions on Computers, vol. C-39, pp. 1249–1268, September 1989.

    Article  MathSciNet  Google Scholar 

  32. J. Bruck, “Optimal broadcasting in faulty hypercubes via edge-disjoint embeddings,” in Tech. Report RJ 7174(67394), IBM, Computer Science, November 1989.

    Google Scholar 

  33. S. Balakrishnan and F. Özgüner, “An n-step global sum/global broadcast algorithm for n-dimensional faulty hypercubes,” in Tech. Report Dept. of Electrical Engineering The Ohio State University, January 1992.

    Google Scholar 

  34. M. Alam and R. Melhem, “Fault tolerance and reliable routing in augmented hypercube architectures,” in IEEE 18th Annual Phoenix Int. Conf. on Computer Communication Proceeding, pp. 19–23, 1989.

    Google Scholar 

  35. B. Izadi and F. Özgüner, “Fault tolerance and reconfigurability of an augmented hypercube with direct connect routing,” in Tech. Report Dept. of Electrical Engineering The Ohio State University, November 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Balakrishnan, S., Özgüner, F., Izadi, B. (1993). Fault Tolerance in Hypercubes. In: Özgüner, F., Erçal, F. (eds) Parallel Computing on Distributed Memory Multiprocessors. NATO ASI Series, vol 103. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-58066-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-58066-6_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-63460-4

  • Online ISBN: 978-3-642-58066-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics