# A framework for analyzing locality and portability issues in parallel computing

## Abstract

This work potentially affect two areas: interconnection network design, and parallel programming methodology.

A key issue in designing parallel computers is the balance between computing power and communication capacity. As we have observed, there exist several problems that are inherently nonlocal, and therefore require high communication capability for efficient implementation. We also listed several problems for which fast network implementations can be designed. Some of these problems, however, possess only limited locality, and thus require relatively powerful communication networks (e.g. Butterflies). To summarize, we cannot give a clear answer to the question of how powerful communication networks we must build; but as more results become known about locality of different problems and as we develop locality exploiting algorithms for more problems, we will have a more complete answer.

Our ideas provide a methodology for developing portable parallel programs. The first step given a problem is to determine its gross locality. This determines a native architecture for the problem. The next step is to design an algorithm on the native model that fully exploits locality. This algorithm can now be simulated on different architectures, and is guaranteed to have good efficiency.

## Keywords

Convex Hull Communication Complexity Native Model Nonlocal Problem Processor Network## Preview

Unable to display preview. Download preview PDF.

## References

- 1.F. Abolhassan, R. Drefenstedt, J. Keller, W. Paul, and D. Scheerer. On the physical design of PRAMS. In J. Buchmann, H. Ganziger, and W. Paul, editors,
*Informatik-Festschrift zum 60. Geburstag von Gunter Hotz*. Teubner Verlag, 1992.Google Scholar - 2.F. Abolhassan, J. Keller, and W. Paul. On the cost-effectiveness of PRAMS. In
*IEEE Symposium on Parallel and Distributed Processing*, pages 2–9, December 1991.Google Scholar - 3.K. Abrahamson, N. Dadoun, D. Kirkpatrick, and T. Pryztycka. A simple parallel tree contraction algorithm. Technical Report 87-30, University of British Columbia, 1987.Google Scholar
- 4.Alok Aggarwal, Ashok Chandra, and Marc Snir. Communication Complexity of PRAMS.
*Theoretical Computer Science*, pages 3–28, March 1990.Google Scholar - 5.Robert Alverson, David Callahan, Daniel Cummings, et al. The TERA Computer System. In
*Proceedings of Supercomputing 90*, pages pp1–6, 1990.Google Scholar - 6.M. Atallah and M. Goodrich. Efficient parallel solutions to some geometric problems.
*Journal of Parallel and Distributed Computing*, 3:492–507, 1986.Google Scholar - 7.S. N. Bhatt, F. R. K. Chung, J. W. Hong, F. T. Leighton, and A. L. Rosenberg. Optimal Simulations by Butterfly Networks. In
*Proceedings of STOC 88*, pages 192–204, 1988.Google Scholar - 8.S. N. Bhatt, F. R. K. Chung, F. T. Leighton, and A. L. Rosenberg. Optimal simulations of tree machines. In
*Proceedings of the IEEE Annual Symposium on The Foundations of Computer Science*, pages 274–282, 1986.Google Scholar - 9.David Blackston and Abhiram Ranade. Snakesort: A family of optimal randomized sorting algorithms, 1993. manuscript.Google Scholar
- 10.Joseph Cheriyan, Torben Hagerup, and Kurt Mehlhorn. Can maximum flow be computed in
*o(nm)*time? Technical Report A 90/07, Universitat des Saarlandes, May 1990.Google Scholar - 11.R. Cole and U. Vishkin. Approximate and exact parallel scheduling with application to list, tree and graph problems. In
*Proceedings of the IEEE Annual Symposium on The Foundations of Computer Science*, pages 478–491, 1986.Google Scholar - 12.D. Culler, R. Karp, D. Patterson, A. Sahay, K. Schauser, E. Santos, R. Subramonian, and T. Eicken. LogP: Towards a realistic model of Parallel Computation. In
*Principles and Practice of Parallel Programming*, 1992. To appear.Google Scholar - 13.H. Gazit. An optimal randomized parallel algorithm for finding connected components in a graph. In
*Proceedings of the IEEE Annual Symposium on The Foundations of Computer Science*, pages 492–501, 1986.Google Scholar - 14.Joseph Ja'Ja'. The VLSI Complexity of Selected Graph Problems.
*Journal of the ACM*, 31:377–391, April 1984.Google Scholar - 15.R. Koch, T. Leighton, B. Maggs, S. Rao, and A. Rosenberg. Work-preserving emulations of fixed-connection networks. In
*Proceedings of the ACM Annual Symposium on Theory of Computing*, pages 227–240, May 1989.Google Scholar - 16.Ernst Mayr, 1992. Personal Communication.Google Scholar
- 17.Abhiram G. Ranade. Optimal speedup for backtrack search on a butterfly network. In
*Proceedings of the ACM Symposium on Parallel Algorithms and Architectures*, pages 40–48, July 1991.Google Scholar - 18.Abhiram G. Ranade. Communication efficient algorithms for some geometric problems. In preparation., 1992.Google Scholar
- 19.Abhiram G. Ranade. Maintaining dynamic ordered sets on processor networks. In
*Proceedings of the ACM Symposium on Parallel Algorithms and Architectures*, pages 127–137, June–July 1992.Google Scholar - 20.Abhiram G. Ranade, Sandeep N. Bhatt, and S. Lennart Johnsson. The Fluent Abstract Machine. In
*Proceedings of the Fifth MIT Conference on Advanced Research in VLSI*, pages 71–94, March 1988. Also available as Yale Univ. Comp. Sc. TR-573.Google Scholar - 21.L. G. Valiant. A Bridging Model for Parallel Computation.
*Communications of the ACM*, 33(8):103–111, August 1990.Google Scholar