Skip to main content

Massively parallel computing: Data distribution and communication

  • Conference paper
  • First Online:
Parallel Architectures and Their Efficient Use (Nixdorf 1992)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 678))

Included in the following conference series:

Abstract

We discuss some techniques for preserving locality of reference in index spaces when mapped to memory units in a distributed memory architecture. In particular, we discuss the use of multidimensional address spaces instead of linearized address spaces, partitioning of irregular grids, and placement of partitions among nodes. We also discuss a set of communication primitives we have found very useful on the Connection Machine systems in implementing scientific and engineering applications. We briefly review some of the techniques used to fully utilize the bandwidth of the binary cube network of the CM-2 and CM-200, and give some performance data from implementations of communication primitives.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. Alspach, J.-C. Bermond, and D. Sotteau. Decomposition into cycles i: Hamilton decompositions. In G. Hahn et. al., editor, Cycles and Graphs, pages 9–18. Kluwer Academic Publishers, 1990.

    Google Scholar 

  2. Christopher R. Anderson. An implementation of the fast multipole method without multipoles. SIAM J. Sci. Stat. Comp., 13(4):923–947, July 1992.

    Google Scholar 

  3. D. P. Bertsekas, C. Ozveren, G.D. Stamoulis, P. Tseng, and J.N. Tsitsiklis. Optimal communication algorithms for hypercubes. Journal of Parallel and Distributed Computing, 11:263–275, 1991.

    Google Scholar 

  4. M. Bromley, Steve Heller, Tim McNerny, and Guy Steele. Fortran at ten Gigaflops: The Connection Machine convolution compiler. In Proceedings of ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation. ACM Press, 1991.

    Google Scholar 

  5. Jean-Philippe Brunet and S. Lennart Johnsson. All-to-all broadcast with applications on the Connection Machine. International Journal of Supercomputer Applications, 6(3):241–256, 1992.

    Google Scholar 

  6. J. Carrier, L. Greengard, and V. Rokhlin. A fast adaptive multipole algorithm for particle simulations. SIAM J. of Scientific and Statistical Computations, 9(4):669–686, July 1988.

    Google Scholar 

  7. M.Y. Chan. Embedding of grids into optimal hypercubes. SIAM J. Computing, 20(5):834–864, 1991.

    Google Scholar 

  8. G. Dahlquist, Å. Björck, and N. Anderson. Numerical Methods. Series in Automatic Computation. Prentice Hall, Inc., Englewood Cliffs, NJ, 1974.

    Google Scholar 

  9. William J. Dally. A VLSI Architecture for Concurrent Data Structures. PhD thesis, California Institute of Technology, 1986.

    Google Scholar 

  10. William J. Dally. The J-Machine: A fine-grain concurrent computer. In Proc. IFIP Congress, pages 1147–1153. North-Holland, August 1989.

    Google Scholar 

  11. Jack. J. Dongarra and Stanley C. Eisenstat. Squeezing the most out of an algorithm in Cray Fortran. ACM Trans. Math. Softw., 10(3):219–230, 1984.

    Google Scholar 

  12. M. Fiedler. Algebraic connectivity of graphs. Czechoslovak Mathematical Journal, 23:298–305, 1973.

    Google Scholar 

  13. M. Fiedler. Eigenvectors of acyclic matrices. Czechoslovak Mathematical Journal, 25:607–618, 1975.

    Google Scholar 

  14. M. Fiedler. A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory. Czechoslovak Mathematical Journal, 25:619–633, 1975.

    Google Scholar 

  15. Charles M. Flaig and Charles L Seitz. Inter-computer message routing system with each computer having separate routing automata for each dimension of the netwrok, 1988. U.S. Patent 5,105,424.

    Google Scholar 

  16. High Performance Fortran Forum. High performance fortran language specification, version 0.4. Technical report, Department of Computer Science, Rice University, November 1992.

    Google Scholar 

  17. Geoffrey C. Fox and Wojtek Furmanski. Optimal communication algorithms on the hypercube. Technical Report CCCP-314, California Institute of Technology, July 1986.

    Google Scholar 

  18. Geoffrey C. Fox, Mark A. Johnsson, Gregory A. Lyzenga, Steve W. Otto, John K. Salmon, and Wojtek Furmanski. Solving Problems on Concurrent Processors. Prentice-Hall, 1988.

    Google Scholar 

  19. William George, Ralph G. Brickner, and S. Lennart Johnsson. Polyshift communications software for the Connection Machine systems CM-2 and CM-200. Technical report, Thinking Machines Corp., March 1992.

    Google Scholar 

  20. Gene Golub and Charles vanLoan. Matrix Computations. The Johns Hopkins University Press, 1985.

    Google Scholar 

  21. Leslie Greengard and Vladimir Rokhlin. A fast algorithm for particle simulations. Journal of Computational Physics, 73:325–348, 1987.

    Google Scholar 

  22. I. Havel and J. Móravek. B-valuations of graphs. Czech. Math. J., 22:338–351, 1972.

    Google Scholar 

  23. Ching-Tien Ho and S. Lennart Johnsson. Spanning balanced trees in Boolean cubes. SIAM Journal on Sci. Stat. Comp, 10(4):607–630, July 1989.

    Google Scholar 

  24. Ching-Tien Ho and S. Lennart Johnsson. Embedding meshes in Boolean cubes by graph decomposition. J. of Parallel and Distributed Computing, 8(4):325–339, April 1990.

    Google Scholar 

  25. Zdenek Johan. Data Parallel Finite Element Techniques for Large-Scale Computational Fluid Dynamics. PhD thesis, Department of Mechanical Engineering, Stanford University, 1992.

    Google Scholar 

  26. Zdenek Johan and Thomas J. R. Hughes. An efficient implementation of the spectral partitioning algorithm on the connection machine systems. In International Conference on Computer Science and Control. INRIA, 1992.

    Google Scholar 

  27. S. Lennart Johnsson. Dense matrix operations on a torus and a Boolean cube. In The National Computer Conference, July 1985.

    Google Scholar 

  28. S. Lennart Johnsson. Communication efficient basic linear algebra computations on hypercube architectures. J. Parallel Distributed Computing, 4(2):133–172, April 1987.

    Google Scholar 

  29. S. Lennart Johnsson. Minimizing the communication time for matrix multiplication on multiprocessors. Technical Report TR-23-91, Harvard University, Division of Applied Sciences, September 1991. To appear in Parallel Computing.

    Google Scholar 

  30. S. Lennart Johnsson. Performance modeling of distributed memory architectures. J. Parallel and Distributed Computing, 12(4):300–312, August 1991.

    Google Scholar 

  31. S. Lennart Johnsson. Data ordering in multisection FFT. Technical report, Thinking Machines Corp., 1992. In preparation.

    Google Scholar 

  32. S. Lennart Johnsson. Compilation Techniques for Novel Architectures, chapter Language and Compiler Issues in Scalable High Performance Libraries. Springer Verlag, 1993. Harvard University Technical Report TR-18-92.

    Google Scholar 

  33. S. Lennart Johnsson and Ching-Tien Ho. Spanning graphs for optimum broadcasting and personalized communication in hypercubes. IEEE Trans. Computers, 38(9):1249–1268, September 1989.

    Google Scholar 

  34. S. Lennart Johnsson and Ching-Tien Ho. Generalized shuffle permutations on Boolean cubes. J. Parallel and Distributed Computing, 16(1):1–14, 1992.

    Google Scholar 

  35. S. Lennart Johnsson and Ching-Tien Ho. Optimal communication channel utilization for matrix transposition and related permutations on Boolean cubes. Discrete Applied Mathematics, 1992.

    Google Scholar 

  36. S. Lennart Johnsson and Ching-Tien Ho. Boolean cube emulation of butterfly networks encoded by Gray code. Journal of Parallel and Distributed Computing, 1993. Department of Computer Science, Yale University, Technical Report, YALEU/DCS/RR-764, February, 1990.

    Google Scholar 

  37. S. Lennart Johnsson, Ching-Tien Ho, Michel Jacquemin, and Alan Ruttenberg. Computing fast Fourier transforms on Boolean cubes and related networks. In Advanced Algorithms and Architectures for Signal Processing II, volume 826, pages 223–231. Society of Photo-Optical Instrumentation Engineers, 1987.

    Google Scholar 

  38. S. Lennart Johnsson, Michel Jacquemin, and Robert L. Krawitz. Communication efficient multi-processor FFT. Journal of Computational Physics, 102(2):381–397, October 1992.

    Google Scholar 

  39. S. Lennart Johnsson and Robert L. Krawitz. Cooley-Tukey FFT on the Connection Machine. Parallel Computing, 18(11):1201–1221, 1992.

    Google Scholar 

  40. Monica S. Lam, Edward E. Rothenberg, and Michael E. Wolf. The cache performance and optimizations of blocked algorithms. In The Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 63–74. ACM Press, 1991.

    Google Scholar 

  41. Guangye Li and Thomas F. Coleman, A parallel triangular solver for a distributed memory multiprocessor. SIAM J. Sci. Statist. Comput., 9(3):485–502, 1988.

    Google Scholar 

  42. Guangye Li and Thomas F. Coleman. A new method for solving triangular systems on a distributed memory message-passing multiprocessor. SIAM J. Sci. Statist. Comput., 10(2):382–396, 1989.

    Google Scholar 

  43. Woody Lichtenstein and S. Lennart Johnsson. Block cyclic dense linear algebra. SIAM Journal of Scientific Computing, 14(5), 1993. Thinking Machines Corp., Technical Report, TMC-215, December 1991.

    Google Scholar 

  44. Christoffer Lutz, Steve Rabin, Charles L. Seitz, and Donald Speck. Design of the mosaic element. In Proceedings, Conf. on Advanced research in VLSI, pages 1–10. Artech House, 1984.

    Google Scholar 

  45. Kapil K. Mathur and S. Lennart Johnsson. Multiplication of matrices of arbitrary shape on a Data Parallel Computer. Technical Report 216, Thinking Machines Corp., December 1991.

    Google Scholar 

  46. Kapil K. Mathur and S. Lennart Johnsson. All-to-all communication. Technical Report 243, Thinking Machines Corp., December 1992.

    Google Scholar 

  47. Kapil K. Mathur and S. Lennart Johnsson. Communication primitives for unstructured finite element simulations on data parallel architectures. Computing Systems in Engineering, 3(1–4):63–72, December 1992.

    Google Scholar 

  48. Alex Pothen, Horst D. Simon, and Kang-Pu Liou. Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl., 11(3):430–452, 1990.

    Google Scholar 

  49. Abhiram Ranade. How to emulate shared memory. In Proceedings of the 28th Annual Symposium on the Foundations of Computer Science, pages 185–194. IEEE Computer Society, October 1987.

    Google Scholar 

  50. Abhiram Ranade and S. Lennart Johnsson. The communication efficiency of meshes, Boolean cubes, and cube connected cycles for wafer scale integration. In 1987 International Conf. on Parallel Processing, pages 479–482. IEEE Computer Society, 1987.

    Google Scholar 

  51. Abhiram G. Ranade, Sandeep N. Bhatt, and S. Lennart Johnsson. The Fluent abstract machine. In Advanced Research in VLSI, Proceedings of the fifth MIT VLSI Conference, pages 71–93. MIT Press, 1988.

    Google Scholar 

  52. E.M. Reingold, J. Nievergelt, and N. Deo. Combinatorial Algorithms. Prentice-Hall, Englewood Cliffs. NJ, 1977.

    Google Scholar 

  53. Arnold L. Rosenberg. Preserving proximity in arrays. SIAM J. Computing, 4:443–460, 1975.

    Google Scholar 

  54. Horst D. Simon. Partitioning of unstructured problems for parallel processing. Computing Systems in Engineering, 2:135–148, 1991.

    Google Scholar 

  55. Quentin F. Stout and Bruce Wagar. Intensive hypercube communication I: prearranged communication in link-bound machines. Technical Report CRL-TR-9-87, Computing Research Lab., Univ. of Michigan, Ann Arbor, MI, 1987.

    Google Scholar 

  56. Quentin F. Stout and Bruce Wagar. Passing messages in link-bound hypercubes. In Michael T. Heath, editor, Hypercube Multiprocessors 1987. Society for Industrial and Applied Mathematics, Philadelphia, PA, 1987.

    Google Scholar 

  57. Paul N. Swarztrauber. Symmetric FFTs. Mathematics of Computation, 47(175):323–346, July 1986.

    Google Scholar 

  58. Paul N. Swarztrauber. Multiprocessor FFTs. Parallel Computing, 5:197–210, 1987.

    Google Scholar 

  59. Clive Temperton. On the FACR(1) algorithm for the discrete Poisson equatron. J. of Computational Physics, 34:314–329, 1980.

    Google Scholar 

  60. Thinking Machines Corp. CMSSL for Fortran, 1990.

    Google Scholar 

  61. Thinking Machines Corp. CM-200 Technical Summary, 1991.

    Google Scholar 

  62. Thinking Machines Corp. CM-5 Technical Summary, 1991.

    Google Scholar 

  63. Thinking Machines Corp. CM Fortran optimization notes: slicewise model, version 1.0, 1991.

    Google Scholar 

  64. Charles Tong and Paul N. Swarztrauber. Ordered Fast Fourier transforms on a masively parallel hypercube multiprocessor. Journal of Parallel and Distributed Computing, 12(1):50–59, May 1991.

    Google Scholar 

  65. Leslie Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11:350–361, 1982.

    Google Scholar 

  66. Leslie Valiant and G.J. Brebner. Universal schemes for parallel communication. In Proc. of the 13th ACM Symposium on the Theory of Computation, pages 263–277. ACM, 1981.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

F. Meyer B. Monien A. L. Rosenberg

Rights and permissions

Reprints and permissions

Copyright information

© 1993 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Johnsson, S.L. (1993). Massively parallel computing: Data distribution and communication. In: Meyer, F., Monien, B., Rosenberg, A.L. (eds) Parallel Architectures and Their Efficient Use. Nixdorf 1992. Lecture Notes in Computer Science, vol 678. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-56731-3_9

Download citation

  • DOI: https://doi.org/10.1007/3-540-56731-3_9

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-56731-8

  • Online ISBN: 978-3-540-47637-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics