From Serial Loops to Parallel Execution on Distributed Systems

Bosilca, George; Bouteiller, Aurelien; Danalis, Anthony; Herault, Thomas; Dongarra, Jack

doi:10.1007/978-3-642-32820-6_25

George Bosilca¹⁹,
Aurelien Bouteiller¹⁹,
Anthony Danalis¹⁹,
Thomas Herault¹⁹ &
…
Jack Dongarra^19,20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7484))

Included in the following conference series:

European Conference on Parallel Processing

3067 Accesses
4 Citations

Abstract

Programmability and performance portability are two major challenges in today’s dynamic environment. Algorithm designers targeting efficient algorithms should focus on designing high-level algorithms exhibiting maximum parallelism, while relying on compilers and run-time systems to discover and exploit this parallelism, delivering sustainable performance on a variety of hardware. The compiler tool presented in this paper can analyze the data flow of serial codes with imperfectly nested, affine loop-nests and if statements, commonly found in scientific applications. This tool operates as the front-end compiler for the DAGuE run-time system by automatically converting serial codes into the symbolic representation of their data flow. We show how the compiler analyzes the data flow, and demonstrate that scientifically important, dense linear algebra operations can benefit from this analysis, and deliver high performance on large scale platforms.

Download to read the full chapter text

Chapter PDF

Affine Parallelization of Loops with Run-Time Dependent Bounds from Binaries

Parallel Programming Models

DASH: Distributed Data Structures and Parallel Algorithms in a Global Address Space

Keywords

References

Ancourt, C., Irigoin, F.: Scanning polyhedra with do loops. In: Proceedings of ACM PPoPP 1991, Williamsburg, VA, pp. 39–50 (1991)
Google Scholar
Baskaran, M.M., Vydyanathan, N., Bondhugula, U.K.R., Ramanujam, J., Rountev, A., Sadayappan, P.: Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. In: Proceedings of ACM PPoPP 2009, Raleigh, NC, pp. 219–228 (2009)
Google Scholar
Bastoul, C.: Code Generation in the Polyhedral Model Is Easier Than You Think. In: Proceedings of IEEE PACT 2004, pp. 7–16. Antibes Juan-les-Pins, France (2004)
Google Scholar
Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia (1997)
Book MATH Google Scholar
Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., Tu, P.: Parallel programming with polaris. IEEE Computer 29, 78–82 (1996)
Article Google Scholar
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: Proceedings of ACM PLDI 2008, Tucson, AZ, pp. 101–113 (2008)
Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Haidar, H., Herault, T., Kurzak, J., Langou, J., Lemarinier, P., Ltaief, H., Luszczek, P., YarKhan, A., Dongarra, J.: Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project. Tech. Rep. 232, LAWN (September 2010)
Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Haidar, A., Herault, T., Kurzak, J., Langou, J., Lemarinier, P., Ltaief, H., Luszczek, P., YarKhan, A., Dongarra, J.: Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In: IEEE PDSEC 2011, Anchorage, AK (2011)
Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: A generic distributed dag engine for high performance computing. In: HIPS 2011, Anchorage, AK (2011)
Google Scholar
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.J.: DAGuE: A generic distributed DAG engine for high performance computing. Parallel Computing (2011) (to appear), http://dx.doi.org/10.1016/j.parco.2011.10.003
Bosilca, G., Bouteiller, A., Hérault, T., Lemarinier, P., Saengpatsa, N.O., Tomov, S., Dongarra, J.J.: Performance portability of a gpu enabled factorization with the dague framework. In: IEEE CLUSTER, pp. 395–402 (2011)
Google Scholar
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. Syst. Appl. 35, 38–53 (2009)
Article MathSciNet Google Scholar
Dongarra, J.J., Luszczek, P., Petitet, A.: The LINPACK benchmark: Past, present and future. Concurrency Computat.: Pract. Exper. 15(9), 803–820 (2003)
Article Google Scholar
van Engelen, R.A., Birch, J., Shou, Y., Walsh, B., Gallivan, K.A.: A unified framework for nonlinear dependence testing and symbolic analysis. In: Proceedings of ACM ICS 2004, Malo, France, pp. 106–115 (2004)
Google Scholar
Feautrier, P.: Dataflow analysis of array and scalar references. International Journal of Parallel Programming 20, 23–53 (1991), 10.1007/BF01407931
Article MATH Google Scholar
Gustavson, F.G., Karlsson, L., Kågström, B.: Distributed SBP cholesky factorization algorithms with near-optimal scheduling. ACM Trans. Math. Softw. 36(2), 1–25 (2009)
Article Google Scholar
Hall, M.W., Anderson, J.M., Amarasinghe, S.P., Murphy, B.R., Liao, S.W., Bugnion, E., Lam, M.S.: Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer 29, 84–89 (1996)
Article Google Scholar
Kyriakopoulos, K., Psarris, K.: Data dependence analysis techniques for increased accuracy and extracted parallelism. International Journal of Parallel Programming 32, 317–359 (2004)
Article MATH Google Scholar
Kyriakopoulos, K., Psarris, K.: Nonlinear Symbolic Analysis for Advanced Program Parallelization. IEEE Transactions on Parallel and Distributed Systems 20, 623–640 (2009)
Article Google Scholar
Maydan, D.E., Hennessy, J.L., Lam, M.S.: Efficient and exact data dependence analysis. In: Proceedings of ACM PLDI 1991, Toronto, Ontario, pp. 1–14 (1991)
Google Scholar
Perez, J.M., Badia, R.M., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of IEEE Cluster Computing, pp. 142–151 (2008)
Google Scholar
Pugh, W.: The omega test: a fast and practical integer programming algorithm for dependence analysis. In: Proceedings of the ACM/IEEE SC 1991, pp. 4–13 (1991)
Google Scholar
Quilleré, F., Rajopadhye, S., Wilde, D.: Generation of efficient nested loops from polyhedra. Int. J. Parallel Program. 28, 469–498 (2000)
Article Google Scholar
Song, F., YarKhan, A., Dongarra, J.: Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems. In: Proceedings of ACM/IEEE SC 2009 (2009)
Google Scholar
Vasilache, N., Bastoul, C., Cohen, A., Girbal, S.: Violated dependence analysis. In: Proceedings of ACM ICS 2006, Cairns, Queensland, Australia, pp. 335–344 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Tennessee, Knoxville, TN, 37996, USA
George Bosilca, Aurelien Bouteiller, Anthony Danalis, Thomas Herault & Jack Dongarra
University of Manchester, Manchester, UK
Jack Dongarra

Authors

George Bosilca
View author publications
You can also search for this author in PubMed Google Scholar
Aurelien Bouteiller
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Danalis
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Herault
View author publications
You can also search for this author in PubMed Google Scholar
Jack Dongarra
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Patras, Computer Technology Institute and Press “Diophantus”,, N. Kazantzaki, 26504, Rio, Greece
Christos Kaklamanis
University of Patras, University Building B, 26504, Rio, Greece
Theodore Papatheodorou
Computer Technology Institute and Press “Diophantus”, University of Patras, N. Kazantzaki, 26504, Rio, Greece
Paul G. Spirakis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Dongarra, J. (2012). From Serial Loops to Parallel Execution on Distributed Systems. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) Euro-Par 2012 Parallel Processing. Euro-Par 2012. Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32820-6_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-32820-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32819-0
Online ISBN: 978-3-642-32820-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

From Serial Loops to Parallel Execution on Distributed Systems

Abstract

Chapter PDF

Similar content being viewed by others

Affine Parallelization of Loops with Run-Time Dependent Bounds from Binaries

Parallel Programming Models

DASH: Distributed Data Structures and Parallel Algorithms in a Global Address Space

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

From Serial Loops to Parallel Execution on Distributed Systems

Abstract

Chapter PDF

Similar content being viewed by others

Affine Parallelization of Loops with Run-Time Dependent Bounds from Binaries

Parallel Programming Models

DASH: Distributed Data Structures and Parallel Algorithms in a Global Address Space

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation