A realizable efficient parallel architecture
The near future will present large scale parallel computers, able to provide computing power of more than one TFlop per second. It is commonly agreed that these systems will be based on the model of asynchronous processors connected by a point to point network. There are a number of different network architectures presented in the past.
In this paper we present an architectural principle that combines efficiency, realizability for very large systems, and inherent reliability needed for such large parallel processing systems. The here presented Fat Mesh of Clos network principle can be scaled in many ways to fulfill the special requirements of a system design.
Two realizations of this principle are presented: One is based on static switches combined to form a fully reconfigurable system. This architecture has been realized for systems containing up to 320 processors.
The other realization uses dynamic routing switches. By combining wormhole routing with randomized and local adaptive routing this network provides large capacity and very short latency times. The efficiency of our principle is demonstrated by simulations.
Both realizations presented here are built and commercialized by Parsytec Computer.
KeywordsNetwork Capacity External Edge Outgoing Link Processor Network Clos Network
Unable to display preview. Download preview PDF.
- 1.V. E. Benes, Mathematical Theory of Connecting Networks and Telephone Traffic, New York, Academic Press, 1965Google Scholar
- 2.B. Bollobás, Extremal Graph Theory, Academic Press 1978Google Scholar
- 3.G. Broomell, J. R. Heath, Classification, Categories and Historical Development of Circuit Switching Technologies, Computing Surveys, vol. 15, no. 2, June 1983Google Scholar
- 4.C. Clos, A study of non blocking switching networks, Bell System Technical Journal, March 1953, pp. 407–424Google Scholar
- 5.W. Dally, Performance Analysis of k ary n cube interconnection networks, IEEE Trans. Computers, 39, 1990, pp. 775–785Google Scholar
- 6.W. Dally, Fine grain message passing concurrent computers, 3rd Conf. on Hypercube Concurrent Computers and Applications, ACM Press, 1988, pp. 2–12Google Scholar
- 7.W. J. Dally, C. L. Seitz, The Torus Routing Chip, Distributed Computing, 1986, no.1, pp. 187–196Google Scholar
- 8.W. J. Dally, C. L. Seitz, Deadlock-Free Message Routing in Multiprocessor Interconnectron Networks, IEEE Transactions on Computers, vol. C-36 1987, no. 5, pp. 547–553Google Scholar
- 9.S. Felperin, P. Raghavan, E. Upfal, A Theory of Wormwhole Routing in Parallel Computers, ACM Symposium on Foundations of Computer Science, 1992, pp. 563–572Google Scholar
- 10.M. J. Flynn, Very high-speed computing systems, Proceedings of the IEEE 54,12, Dec. 1966, pp. 1901–1909Google Scholar
- 11.R. Funke, R. Lüling, B. Monien, F. Lücking, H. Blanke-Bohne, An optimized reoncfigurable architecture for transputer networks, Proc. of the 25th Hawaii Int. Conf. on System Sciences (HICSS) 1992, vol. 1, pp. 237–245Google Scholar
- 12.H. Hofestädt, A. Klein, E. Reyzl, Performance Benefits from Locally Adaptrve Interval Routing in Dynamically Switched Interconnection Networks, Proc. of 2nd European Distributed Memory Computing Conference, Lecture Notes in Computer Science 487, pp. 193–202Google Scholar
- 13.Inmos, The T9000 Transputer Products Overview Manual, First Edition 1991Google Scholar
- 14.F. Langhammer, F. Wray, Supercomputing and Transputers, ACM Int. Conf. on Supercomputing, 1992, pp. 114–129Google Scholar
- 15.F. T. Leighton, Introduction to Parallel Algorithms and Architectures, Arrays, Trees, Hypercubes, Morgan Kaufmann Publishers, 1992Google Scholar
- 16.F. T. Leighton, B. M. Maggs, A. G. Ranade, S. B. Rao, Randomized Routing and Sorting on Fixed-Connection Networks Internal ReportGoogle Scholar
- 17.C. E. Leiserson et.al., The Network Architecture of the Connection Machine CM-5, ACM Symposium on Parallel Algorithms and Architectures, 1992, pp. 272–285Google Scholar
- 18.Meiko CS-2, product announcment at Supercomputing 92, Minneapolis, Parallelogram, November 1992, pp. 10–11Google Scholar
- 19.B. Monien, H. Sudborough, Embedding one Interconnection Network in Another, Computing Suppl. 7, 1990, pp. 257–282Google Scholar
- 20.B. Monien, R. Feldmann, R. Klasing, R. Lüling, Parallel Architectures: Design and Efficient Use, Symposium on Theoretical Aspects of Computer Science (STACS) 1993, Lecture Notes in Computer ScienceGoogle Scholar
- 21.J. Petersen, Die Theorie regulärer Graphen, Acta Math. 15 1891, pp. 193–220Google Scholar
- 22.J. Rattner, The New Age of Supercomputing, 2nd European Conf. on Distributed Memory Computing 1991, Lecture Notes in Computer Science 487, pp. 1–6Google Scholar
- 23.G. D. Stamoulis, J. N. Tsitsiklis, The Efficiency of Greedy Routing in Hypercubes and Butterflies, ACM Symposium on Parallel Algorrthms and Architectures, 1991, pp. 248–259Google Scholar
- 24.J. D. Ullman, Computational Aspects of VLSI, Computer Science Press, Inc. 1984Google Scholar
- 25.L. G. Valiant, G. J. Brebner, Universal schemes for parallel communication Proc. of ACM STOC 1981, pp. 263–277Google Scholar
- 26.L. G. Valiant, General Purpose Parallel Architectures, in: J. van Leeuwen, Handbook of Theoretrcal Computer Science, vol. A, chapter 18, pp. 943–971, Elsevier Publishers, 1990Google Scholar
- 27.J. van Leeuwen, R. B. Tan, Interval Routing, The Computer Journal, vol. 30, no. 4, 1987, pp. 298–307Google Scholar
- 28.J. S. Ward, J. B. G. Roberts, J. G. Harp, Design of a Configurable Multi-Transputer Machine, Esprit P1085 Working Paper 1, August 1985Google Scholar