Abstract
We discuss some techniques for preserving locality of reference in index spaces when mapped to memory units in a distributed memory architecture. In particular, we discuss the use of multidimensional address spaces instead of linearized address spaces, partitioning of irregular grids, and placement of partitions among nodes. We also discuss a set of communication primitives we have found very useful on the Connection Machine systems in implementing scientific and engineering applications. We briefly review some of the techniques used to fully utilize the bandwidth of the binary cube network of the CM-2 and CM-200, and give some performance data from implementations of communication primitives.
Preview
Unable to display preview. Download preview PDF.
References
B. Alspach, J.-C. Bermond, and D. Sotteau. Decomposition into cycles i: Hamilton decompositions. In G. Hahn et. al., editor, Cycles and Graphs, pages 9–18. Kluwer Academic Publishers, 1990.
Christopher R. Anderson. An implementation of the fast multipole method without multipoles. SIAM J. Sci. Stat. Comp., 13(4):923–947, July 1992.
D. P. Bertsekas, C. Ozveren, G.D. Stamoulis, P. Tseng, and J.N. Tsitsiklis. Optimal communication algorithms for hypercubes. Journal of Parallel and Distributed Computing, 11:263–275, 1991.
M. Bromley, Steve Heller, Tim McNerny, and Guy Steele. Fortran at ten Gigaflops: The Connection Machine convolution compiler. In Proceedings of ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation. ACM Press, 1991.
Jean-Philippe Brunet and S. Lennart Johnsson. All-to-all broadcast with applications on the Connection Machine. International Journal of Supercomputer Applications, 6(3):241–256, 1992.
J. Carrier, L. Greengard, and V. Rokhlin. A fast adaptive multipole algorithm for particle simulations. SIAM J. of Scientific and Statistical Computations, 9(4):669–686, July 1988.
M.Y. Chan. Embedding of grids into optimal hypercubes. SIAM J. Computing, 20(5):834–864, 1991.
G. Dahlquist, Å. Björck, and N. Anderson. Numerical Methods. Series in Automatic Computation. Prentice Hall, Inc., Englewood Cliffs, NJ, 1974.
William J. Dally. A VLSI Architecture for Concurrent Data Structures. PhD thesis, California Institute of Technology, 1986.
William J. Dally. The J-Machine: A fine-grain concurrent computer. In Proc. IFIP Congress, pages 1147–1153. North-Holland, August 1989.
Jack. J. Dongarra and Stanley C. Eisenstat. Squeezing the most out of an algorithm in Cray Fortran. ACM Trans. Math. Softw., 10(3):219–230, 1984.
M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 23:298–305, 1973.
M. Fiedler. Eigenvectors of acyclic matrices. Czechoslovak Mathematical Journal, 25:607–618, 1975.
M. Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory. Czechoslovak Mathematical Journal, 25:619–633, 1975.
Charles M. Flaig and Charles L Seitz. Inter-computer message routing system with each computer having separate routing automata for each dimension of the netwrok, 1988. U.S. Patent 5,105,424.
High Performance Fortran Forum. High performance fortran language specification, version 0.4. Technical report, Department of Computer Science, Rice University, November 1992.
Geoffrey C. Fox and Wojtek Furmanski. Optimal communication algorithms on the hypercube. Technical Report CCCP-314, California Institute of Technology, July 1986.
Geoffrey C. Fox, Mark A. Johnsson, Gregory A. Lyzenga, Steve W. Otto, John K. Salmon, and Wojtek Furmanski. Solving Problems on Concurrent Processors. Prentice-Hall, 1988.
William George, Ralph G. Brickner, and S. Lennart Johnsson. Polyshift communications software for the Connection Machine systems CM-2 and CM-200. Technical report, Thinking Machines Corp., March 1992.
Gene Golub and Charles vanLoan. Matrix Computations. The Johns Hopkins University Press, 1985.
Leslie Greengard and Vladimir Rokhlin. A fast algorithm for particle simulations. Journal of Computational Physics, 73:325–348, 1987.
I. Havel and J. Móravek. B-valuations of graphs. Czech. Math. J., 22:338–351, 1972.
Ching-Tien Ho and S. Lennart Johnsson. Spanning balanced trees in Boolean cubes. SIAM Journal on Sci. Stat. Comp, 10(4):607–630, July 1989.
Ching-Tien Ho and S. Lennart Johnsson. Embedding meshes in Boolean cubes by graph decomposition. J. of Parallel and Distributed Computing, 8(4):325–339, April 1990.
Zdenek Johan. Data Parallel Finite Element Techniques for Large-Scale Computational Fluid Dynamics. PhD thesis, Department of Mechanical Engineering, Stanford University, 1992.
Zdenek Johan and Thomas J. R. Hughes. An efficient implementation of the spectral partitioning algorithm on the connection machine systems. In International Conference on Computer Science and Control. INRIA, 1992.
S. Lennart Johnsson. Dense matrix operations on a torus and a Boolean cube. In The National Computer Conference, July 1985.
S. Lennart Johnsson. Communication efficient basic linear algebra computations on hypercube architectures. J. Parallel Distributed Computing, 4(2):133–172, April 1987.
S. Lennart Johnsson. Minimizing the communication time for matrix multiplication on multiprocessors. Technical Report TR-23-91, Harvard University, Division of Applied Sciences, September 1991. To appear in Parallel Computing.
S. Lennart Johnsson. Performance modeling of distributed memory architectures. J. Parallel and Distributed Computing, 12(4):300–312, August 1991.
S. Lennart Johnsson. Data ordering in multisection FFT. Technical report, Thinking Machines Corp., 1992. In preparation.
S. Lennart Johnsson. Compilation Techniques for Novel Architectures, chapter Language and Compiler Issues in Scalable High Performance Libraries. Springer Verlag, 1993. Harvard University Technical Report TR-18-92.
S. Lennart Johnsson and Ching-Tien Ho. Spanning graphs for optimum broadcasting and personalized communication in hypercubes. IEEE Trans. Computers, 38(9):1249–1268, September 1989.
S. Lennart Johnsson and Ching-Tien Ho. Generalized shuffle permutations on Boolean cubes. J. Parallel and Distributed Computing, 16(1):1–14, 1992.
S. Lennart Johnsson and Ching-Tien Ho. Optimal communication channel utilization for matrix transposition and related permutations on Boolean cubes. Discrete Applied Mathematics, 1992.
S. Lennart Johnsson and Ching-Tien Ho. Boolean cube emulation of butterfly networks encoded by Gray code. Journal of Parallel and Distributed Computing, 1993. Department of Computer Science, Yale University, Technical Report, YALEU/DCS/RR-764, February, 1990.
S. Lennart Johnsson, Ching-Tien Ho, Michel Jacquemin, and Alan Ruttenberg. Computing fast Fourier transforms on Boolean cubes and related networks. In Advanced Algorithms and Architectures for Signal Processing II, volume 826, pages 223–231. Society of Photo-Optical Instrumentation Engineers, 1987.
S. Lennart Johnsson, Michel Jacquemin, and Robert L. Krawitz. Communication efficient multi-processor FFT. Journal of Computational Physics, 102(2):381–397, October 1992.
S. Lennart Johnsson and Robert L. Krawitz. Cooley-Tukey FFT on the Connection Machine. Parallel Computing, 18(11):1201–1221, 1992.
Monica S. Lam, Edward E. Rothenberg, and Michael E. Wolf. The cache performance and optimizations of blocked algorithms. In The Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63–74. ACM Press, 1991.
Guangye Li and Thomas F. Coleman, A parallel triangular solver for a distributed memory multiprocessor. SIAM J. Sci. Statist. Comput., 9(3):485–502, 1988.
Guangye Li and Thomas F. Coleman. A new method for solving triangular systems on a distributed memory message-passing multiprocessor. SIAM J. Sci. Statist. Comput., 10(2):382–396, 1989.
Woody Lichtenstein and S. Lennart Johnsson. Block cyclic dense linear algebra. SIAM Journal of Scientific Computing, 14(5), 1993. Thinking Machines Corp., Technical Report, TMC-215, December 1991.
Christoffer Lutz, Steve Rabin, Charles L. Seitz, and Donald Speck. Design of the mosaic element. In Proceedings, Conf. on Advanced research in VLSI, pages 1–10. Artech House, 1984.
Kapil K. Mathur and S. Lennart Johnsson. Multiplication of matrices of arbitrary shape on a Data Parallel Computer. Technical Report 216, Thinking Machines Corp., December 1991.
Kapil K. Mathur and S. Lennart Johnsson. All-to-all communication. Technical Report 243, Thinking Machines Corp., December 1992.
Kapil K. Mathur and S. Lennart Johnsson. Communication primitives for unstructured finite element simulations on data parallel architectures. Computing Systems in Engineering, 3(1–4):63–72, December 1992.
Alex Pothen, Horst D. Simon, and Kang-Pu Liou. Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl., 11(3):430–452, 1990.
Abhiram Ranade. How to emulate shared memory. In Proceedings of the 28th Annual Symposium on the Foundations of Computer Science, pages 185–194. IEEE Computer Society, October 1987.
Abhiram Ranade and S. Lennart Johnsson. The communication efficiency of meshes, Boolean cubes, and cube connected cycles for wafer scale integration. In 1987 International Conf. on Parallel Processing, pages 479–482. IEEE Computer Society, 1987.
Abhiram G. Ranade, Sandeep N. Bhatt, and S. Lennart Johnsson. The Fluent abstract machine. In Advanced Research in VLSI, Proceedings of the fifth MIT VLSI Conference, pages 71–93. MIT Press, 1988.
E.M. Reingold, J. Nievergelt, and N. Deo. Combinatorial Algorithms. Prentice-Hall, Englewood Cliffs. NJ, 1977.
Arnold L. Rosenberg. Preserving proximity in arrays. SIAM J. Computing, 4:443–460, 1975.
Horst D. Simon. Partitioning of unstructured problems for parallel processing. Computing Systems in Engineering, 2:135–148, 1991.
Quentin F. Stout and Bruce Wagar. Intensive hypercube communication I: prearranged communication in link-bound machines. Technical Report CRL-TR-9-87, Computing Research Lab., Univ. of Michigan, Ann Arbor, MI, 1987.
Quentin F. Stout and Bruce Wagar. Passing messages in link-bound hypercubes. In Michael T. Heath, editor, Hypercube Multiprocessors 1987. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1987.
Paul N. Swarztrauber. Symmetric FFTs. Mathematics of Computation, 47(175):323–346, July 1986.
Paul N. Swarztrauber. Multiprocessor FFTs. Parallel Computing, 5:197–210, 1987.
Clive Temperton. On the FACR(1) algorithm for the discrete Poisson equatron. J. of Computational Physics, 34:314–329, 1980.
Thinking Machines Corp. CMSSL for Fortran, 1990.
Thinking Machines Corp. CM-200 Technical Summary, 1991.
Thinking Machines Corp. CM-5 Technical Summary, 1991.
Thinking Machines Corp. CM Fortran optimization notes: slicewise model, version 1.0, 1991.
Charles Tong and Paul N. Swarztrauber. Ordered Fast Fourier transforms on a masively parallel hypercube multiprocessor. Journal of Parallel and Distributed Computing, 12(1):50–59, May 1991.
Leslie Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11:350–361, 1982.
Leslie Valiant and G.J. Brebner. Universal schemes for parallel communication. In Proc. of the 13th ACM Symposium on the Theory of Computation, pages 263–277. ACM, 1981.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1993 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Johnsson, S.L. (1993). Massively parallel computing: Data distribution and communication. In: Meyer, F., Monien, B., Rosenberg, A.L. (eds) Parallel Architectures and Their Efficient Use. Nixdorf 1992. Lecture Notes in Computer Science, vol 678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56731-3_9
Download citation
DOI: https://doi.org/10.1007/3-540-56731-3_9
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-56731-8
Online ISBN: 978-3-540-47637-5
eBook Packages: Springer Book Archive