Evaluating Arithmetic Expressions Using Tree Contraction: A Fast and Scalable Parallel Implementation for Symmetric Multiprocessors (SMPs)
The ability to provide uniform shared-memory access to a significant number of processors in a single SMP node brings us much closer to the ideal PRAM parallel computer. In this paper, we develop new techniques for designing a uniform shared-memory algorithm from a PRAM algorithm and present the results of an extensive experimental study demonstrating that the resulting programs scale nearly linearly across a significant range of processors and across the entire range of instance sizes tested. This linear speedup with the number of processors is one of the first ever attained in practice for intricate combinatorial problems. The example we present in detail here is for evaluating arithmetic expression trees using the algorithmic techniques of list ranking and tree contraction; this problem is not only of interest in its own right, but is representative of a large class of irregular combinatorial problems that have simple and efficient sequential implementations and fast PRAM algorithms, but have no known efficient parallel implementations. Our results thus offer promise for bridging the gap between the theory and practice of shared-memory parallel algorithms.
Keywords:Expression Evaluation Tree Contraction Parallel Graph Algorithms Shared Memory High-Performance Algorithm Engineering
Unable to display preview. Download preview PDF.
- D.A. Bader, A.K. Illendula, B.M.E. Moret, and N. Weisse-Bernstein. Using PRAM algorithms on a uniform-memory-access shared-memory architecture. In G. S. Brodal, D. Frigioni, and A. Marchetti-Spaccamela, eds., Proc. 5th Int’l Workshop on Algorithm Engineering (WAE 2001), volume 2141 of Lecture Notes in Computer Science, pages 129–144, Århus, Denmark, August 2001. Springer-Verlag. 65Google Scholar
- E. Cáceres, F. Dehne, A. Ferreira, P. Flocchini, I. Rieping, A. Roncato, N. Santoro,and S.W. Song. Efficient parallel graph algorithms for coarse grained multicomputers and BSP. In Proc. 24th Int’l Colloquium on Automata, Languages and Programming (ICALP’97), volume 1256 of Lecture Notes in Computer Science, pages 390–400, Bologna, Italy, 1997. Springer-Verlag. 65Google Scholar
- A. Charlesworth. The Sun Fireplane system interconnect. In Proc. Supercomputing (SC 2001), pages 1–14, Denver, CO, November 2001. 64Google Scholar
- B. Grayson, M. Dahlin, and V. Ramachandran. Experimental evaluation of QSM, a simple shared-memory model. In Proc. 13th Int’l Parallel Processing Symp. and 10th Symp. Parallel and Distributed Processing (IPPS/SPDP), pages 1–7, San Juan, Puerto Rico, April 1999. 65Google Scholar
- T.-S. Hsu, V. Ramachandran, and N. Dean. Implementation of parallel graphalgorithms on a massively parallel SIMD computer with virtual processing. In Proc. 9th Int’l Parallel Processing Symp., pages 106–112, Santa Barbara, CA, April 1995. 65Google Scholar
- J. Keller, C.W. Keßler, and J. L. Träff. Practical PRAM Programming. John Wiley & Sons, 2001. 65Google Scholar
- S.R. Kosaraju and A.L. Delcher. Optimal parallel evaluation of tree-structured computations by raking (extended abstract). Technical report, The Johns Hopkins University, 1987. 65, 66, 68Google Scholar
- G. L. Miller and J.H. Reif. Parallel tree contraction and its application. In Proc. 26th Ann. IEEE Symp. Foundations of Computer Science (FOCS), pages 478–489, Portland, OR, October 1985. IEEE Press. 65Google Scholar
- J.H. Reif, editor. Synthesis of Parallel Algorithms. Morgan Kaufmann Publishers, 1993. 64Google Scholar
- J. Sibeyn. Better trade-offs for parallel list ranking. In Proc. 9th Ann. Symp. Parallel Algorithms and Architectures (SPAA-97), pages 221–230, Newport, RI, June 1997. ACM. 65Google Scholar