Abstract
The BSP (bulk synchronous parallel) model has been gaining adherents as a standard model for programming parallel computers. When programmed in direct-mode, the BSP is supposed to predict runtimes for specific machines based just on three parameters. Although in many situations it does a very good job, sometimes it predicts unnecessarily high levels of slackness. We present two refinements of the BSP model aimed at enhancing its predictive potential: one to account for submachine locality and the other to more accurately reflect router load. We illustrate how the refined models allow one to obtain better estimates of algorithm performance with manageable accounting of costs. In particular, we look at parallel prefix, FFT, and matrix multiplication. The refined models more effectively capture the amount of slackness these problems need to execute efficiently.
The research of this author was supported in part by the National Science Foundation under Grant CCR-9410592.
Chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
A. Aggarwal, A. Chandra, and M. Snir. On communication latency in PRAM computations. In Proc. ACM 1st Ann. Symp. on Parallel Algorithms and Architectures, pages 11–21, Sante Fe, New Mexico, 1989.
A. Aggarwal, A. Chandra, and M. Snir. Communication complexity of PRAMs. Theoretical Computer Science, 71:3–28, 1990.
D. Culler, R. Karp, D. Patterson, A. Sahay, K.R. Schauser, E. Santos, R. Subramonian, and T. von Eicken. Log p: Towards a realistic model of parallel computation. In 4th ACM SIGPLAN Symp. on Principles and Practices of Parallel Programming, pages 1–12, May 1993.
P. de la Torre and C.P. Kruskal. Towards a single model of efficient computation in real parallel machines. In Parallel Architectures and Languages Europe (PARLE'91), E.H.L. Aarts, J. van Leeuwen, and M. Rem, editors, pages 6–24. Lecture Notes in Computer Science, Springer-Verlag, vol 505, 1991. Journal version appeared in Future Generations Computer Systems, 8:395–408, 1992.
P. de la Torre and C.P. Kruskal. Submachine locality in the bulk synchronous setting. Technical Report 96-03, Department of Computer Science, University of New Hampshire, May 1996.
A. Gramma, V. Kumar, S. Ranka, and V. Singh. On architecture independent design and analysis of parallel programs. Manuscript, 1995.
T. Heywood and S. Ranka. A practical model of parallel computation: I. The model. In Proc. 3rd IEEE Symp. on Parallel and Distributed Processing, December 1991. Journal version appeared Journal of Parallel and Distributed Algorithms, 16: 212–232.
W. P. McColl. Scalable parallel computing. To appear in LNCS Volume 1000, J. van Leeuwen, editor, August, 1995.
C. Papadimitriou, and M. Yannakakis. Towards an Architecture-Independent Analysis of Parallel Algorithms. In Proc. of the 20th Ann. ACM Symp. on Theory of Computing, pages 510–513, 1988.
L.G. Valiant. A bridging model for parallel computation. Communications of the ACM, pages 103–111, 1990.
L.G. Valiant. General purpose parallel architectures. In Handbook of Theoretical Computer Science, Vol. A, J. van Leeuwen, editor, pages 943–971. Elsevier Science Publishers B.V., Amsterdam, The Netherlands, 1990.
Author information
Authors and Affiliations
Corresponding author
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
de la Torre, P., Kruskal, C.P. (1996). Submachine locality in the bulk synchronous setting. In: Bougé, L., Fraigniaud, P., Mignotte, A., Robert, Y. (eds) Euro-Par'96 Parallel Processing. Euro-Par 1996. Lecture Notes in Computer Science, vol 1124. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0024723
Download citation
DOI: https://doi.org/10.1007/BFb0024723
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61627-6
Online ISBN: 978-3-540-70636-6
eBook Packages: Springer Book Archive