Abstract
Programmability and performance portability are two major challenges in today’s dynamic environment. Algorithm designers targeting efficient algorithms should focus on designing high-level algorithms exhibiting maximum parallelism, while relying on compilers and run-time systems to discover and exploit this parallelism, delivering sustainable performance on a variety of hardware. The compiler tool presented in this paper can analyze the data flow of serial codes with imperfectly nested, affine loop-nests and if statements, commonly found in scientific applications. This tool operates as the front-end compiler for the DAGuE run-time system by automatically converting serial codes into the symbolic representation of their data flow. We show how the compiler analyzes the data flow, and demonstrate that scientifically important, dense linear algebra operations can benefit from this analysis, and deliver high performance on large scale platforms.
Chapter PDF
Similar content being viewed by others
References
Ancourt, C., Irigoin, F.: Scanning polyhedra with do loops. In: Proceedings of ACM PPoPP 1991, Williamsburg, VA, pp. 39–50 (1991)
Baskaran, M.M., Vydyanathan, N., Bondhugula, U.K.R., Ramanujam, J., Rountev, A., Sadayappan, P.: Compiler-assisted dynamic scheduling for effective parallelization of loop nests on multicore processors. In: Proceedings of ACM PPoPP 2009, Raleigh, NC, pp. 219–228 (2009)
Bastoul, C.: Code Generation in the Polyhedral Model Is Easier Than You Think. In: Proceedings of IEEE PACT 2004, pp. 7–16. Antibes Juan-les-Pins, France (2004)
Blackford, L.S., Choi, J., Cleary, A., D’Azevedo, E., Demmel, J., Dhillon, I., Dongarra, J., Hammarling, S., Henry, G., Petitet, A., Stanley, K., Walker, D., Whaley, R.C.: ScaLAPACK Users’ Guide. Society for Industrial and Applied Mathematics, Philadelphia (1997)
Blume, W., Doallo, R., Eigenmann, R., Grout, J., Hoeflinger, J., Lawrence, T., Lee, J., Padua, D., Paek, Y., Pottenger, B., Rauchwerger, L., Tu, P.: Parallel programming with polaris. IEEE Computer 29, 78–82 (1996)
Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: Proceedings of ACM PLDI 2008, Tucson, AZ, pp. 101–113 (2008)
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Haidar, H., Herault, T., Kurzak, J., Langou, J., Lemarinier, P., Ltaief, H., Luszczek, P., YarKhan, A., Dongarra, J.: Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project. Tech. Rep. 232, LAWN (September 2010)
Bosilca, G., Bouteiller, A., Danalis, A., Faverge, M., Haidar, A., Herault, T., Kurzak, J., Langou, J., Lemarinier, P., Ltaief, H., Luszczek, P., YarKhan, A., Dongarra, J.: Flexible development of dense linear algebra algorithms on massively parallel architectures with DPLASMA. In: IEEE PDSEC 2011, Anchorage, AK (2011)
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.: DAGuE: A generic distributed dag engine for high performance computing. In: HIPS 2011, Anchorage, AK (2011)
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Lemarinier, P., Dongarra, J.J.: DAGuE: A generic distributed DAG engine for high performance computing. Parallel Computing (2011) (to appear), http://dx.doi.org/10.1016/j.parco.2011.10.003
Bosilca, G., Bouteiller, A., Hérault, T., Lemarinier, P., Saengpatsa, N.O., Tomov, S., Dongarra, J.J.: Performance portability of a gpu enabled factorization with the dague framework. In: IEEE CLUSTER, pp. 395–402 (2011)
Buttari, A., Langou, J., Kurzak, J., Dongarra, J.J.: A class of parallel tiled linear algebra algorithms for multicore architectures. Parallel Comput. Syst. Appl. 35, 38–53 (2009)
Dongarra, J.J., Luszczek, P., Petitet, A.: The LINPACK benchmark: Past, present and future. Concurrency Computat.: Pract. Exper. 15(9), 803–820 (2003)
van Engelen, R.A., Birch, J., Shou, Y., Walsh, B., Gallivan, K.A.: A unified framework for nonlinear dependence testing and symbolic analysis. In: Proceedings of ACM ICS 2004, Malo, France, pp. 106–115 (2004)
Feautrier, P.: Dataflow analysis of array and scalar references. International Journal of Parallel Programming 20, 23–53 (1991), 10.1007/BF01407931
Gustavson, F.G., Karlsson, L., Kågström, B.: Distributed SBP cholesky factorization algorithms with near-optimal scheduling. ACM Trans. Math. Softw. 36(2), 1–25 (2009)
Hall, M.W., Anderson, J.M., Amarasinghe, S.P., Murphy, B.R., Liao, S.W., Bugnion, E., Lam, M.S.: Maximizing multiprocessor performance with the SUIF compiler. IEEE Computer 29, 84–89 (1996)
Kyriakopoulos, K., Psarris, K.: Data dependence analysis techniques for increased accuracy and extracted parallelism. International Journal of Parallel Programming 32, 317–359 (2004)
Kyriakopoulos, K., Psarris, K.: Nonlinear Symbolic Analysis for Advanced Program Parallelization. IEEE Transactions on Parallel and Distributed Systems 20, 623–640 (2009)
Maydan, D.E., Hennessy, J.L., Lam, M.S.: Efficient and exact data dependence analysis. In: Proceedings of ACM PLDI 1991, Toronto, Ontario, pp. 1–14 (1991)
Perez, J.M., Badia, R.M., Labarta, J.: A dependency-aware task-based programming environment for multi-core architectures. In: Proceedings of IEEE Cluster Computing, pp. 142–151 (2008)
Pugh, W.: The omega test: a fast and practical integer programming algorithm for dependence analysis. In: Proceedings of the ACM/IEEE SC 1991, pp. 4–13 (1991)
Quilleré, F., Rajopadhye, S., Wilde, D.: Generation of efficient nested loops from polyhedra. Int. J. Parallel Program. 28, 469–498 (2000)
Song, F., YarKhan, A., Dongarra, J.: Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems. In: Proceedings of ACM/IEEE SC 2009 (2009)
Vasilache, N., Bastoul, C., Cohen, A., Girbal, S.: Violated dependence analysis. In: Proceedings of ACM ICS 2006, Cairns, Queensland, Australia, pp. 335–344 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bosilca, G., Bouteiller, A., Danalis, A., Herault, T., Dongarra, J. (2012). From Serial Loops to Parallel Execution on Distributed Systems. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds) Euro-Par 2012 Parallel Processing. Euro-Par 2012. Lecture Notes in Computer Science, vol 7484. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32820-6_25
Download citation
DOI: https://doi.org/10.1007/978-3-642-32820-6_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32819-0
Online ISBN: 978-3-642-32820-6
eBook Packages: Computer ScienceComputer Science (R0)