Abstract
Message-passing is a predominant programming paradigm for distributed memory systems. RDMA networks like infiniBand and Myrinet reduce communication overhead by overlapping communication with computation. For the overlap to be more effective, we propose a source-to-source transformation scheme by automatically restructuring message-passing codes. The extensions to control-flow graph can accurately analyze the message-passing program and help perform data-flow analysis effectively. This analysis identifies the minimal region between producer and consumer, which contains message-passing functional calls. Using inter-procedural data-flow analysis, the transformation scheme enables the overlap of communication with computation. Experiments on the well-known NAS Parallel Benchmarks show that for distributed memory systems, versions employing communication-computation overlap are faster than original programs.
Chapter PDF
Similar content being viewed by others
Keywords
References
Basumallik, A., Eigenmann, R.: Optimizing Irregular Shared-Memory Applications for Distributed-Memory Systems, PPOPP, New York, USA, March 29-31 (2006)
Fishgold, L., Danalis, A., Pollock, L., Swany, M.: An automated approach to improve communication-computation overlap in cluster. NIC Series, vol. 33, pp. 481–488. John von Neumann Institute for computing, Julich (2006)
Danalis, A., Pollock, L., Swany, M.: Automatic MPI application transformation with ASPhALT. IEEE, Los Alamitos (2007)
Danalis, A., Kim, K.-Y., Pollock, L., Swany, M.: Transformations to Parallel Codes for Communication-computation Overlap. ACM, New York (2005)
Kreaseck, B., Carter, L., Casanova, H., Ferrante, J.: On the Interference of Communication on Computation in Java. IEEE, Los Alamitos (2004)
El-Ghazawi, T.A., Carlson, W.W., Draper, J.M.: UPC specification, v. 1.1 (2003), http://upc.gwu.edu/documentation
Hilfinger, P., Bonachea, D., Gay, D., Graham, S., Liblit, B., Pike, G., Yelick, K.: Titanium language reference manual. tech report ucb/csd-01-1163, u.c. berkeley (November 2001)
Numrich, R.W., Reid, J.K.: Co-Array Fortran for parallel programming. ACM FortranForum 17(2), 1–31 (1998)
Goumas, G., Sotiropoulos, A., Koziris, N.: Minimizing completion time for loop tiling with computation and communication overlapping. In: Proceedings of the 15th International Parallel and Distributed Processing Symposium (IPDPS 2001), April 23–27, 2001, p. 39. IEEE Computer Society, Los Alamitos (2001)
Gupta, S.K.S., Huang, C.-H., Sadayappan, P., Johnson, R.W.: Atechnique for overlapping computation and communication for block recursive algorithms. Concurrency: Practiceand Experience 10(2), 73–90 (1998)
Sohn, A., Biswas, R.: Communication studies of dmp and smp machines. Technical Report NAS-97-005,NASA Ames ResearchCenter (March 1997)
Shires, D., Pollock, L., Sprenkle, S.: Program Flow Graph Construction for Static Analysis of MPI programs. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 1999) (June 1999)
Hoefler, T., Lumsdaine, A., Rehm, W.: Implementation and Performance Analysis of Non-Blocking Collective operations for MPI. In: SC 2007, Reno, Nevada, USA, November 10-16 (2007)
Hoefler, T., Lumsdaine, A.: Optimizing non-blocking collective operations for infiband (April 2008); Accepted for publication at the CAC 2008 in conjunction with the IDPDS 2008
Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V., Weeratunga, S.: The NAS parallel benchmarks, Tech. Rep. RNR-94-007, NASA Ames
Gupta, M., Miskiff, S., Schonberg, E., Seshadri, V., Shields, D., Wang, K., Ching, W., Ngo, T.: An HPF compiler for the IBM SP2. In: Proceedings of Supercomputing 1995, San Diego, CA (1995)
Ishizaki, K., Komatsu, H., Nakatani, T.: A loop transformation algorithm for communication overlapping. International Journal of Parallel Programming 28(2), 135–154 (2000)
Tseng, E.H.Y., Gaudiot, J.L.: Communication generation for aligned and Cyclic(k) distributions using integer lattice. IEEE Transactions on Parallel Distributed Systems 10(2), 136–146 (1999)
Lancu, C., Husbands, P., Chen, W.: Message Strip Mining Heuristics for High Speed Networks. In: VECPAR (2004)
Bell, C., Bonachea, D., Nishtala, R., Yelich, K.: Optimizing Bandwidth Limited Problems Using One-Side communication and overlap. In: 20th International parallel & Distributed Processing Symposium (IPDPS) (2006)
Kennedy, K., Sethi, A.: A Communication Placement Framework with Unified Dependence and Data-flow Analysis. In: Proceeding 3rd International Conference on High Performance Computing, December 19-22, 1996, pp. 201–208 (1996)
Hu, C., Yao, G., Wang, J., Li, J.: OpenMP Extensions for Irregular Parallel Applications on Cluster. In: Chapman, B.M., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 IFIP International Federation for Information Processing
About this paper
Cite this paper
Hu, C., Shao, Y., Wang, J., Li, J. (2008). Automatic Transformation for Overlapping Communication and Computation. In: Cao, J., Li, M., Wu, MY., Chen, J. (eds) Network and Parallel Computing. NPC 2008. Lecture Notes in Computer Science, vol 5245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88140-7_19
Download citation
DOI: https://doi.org/10.1007/978-3-540-88140-7_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88139-1
Online ISBN: 978-3-540-88140-7
eBook Packages: Computer ScienceComputer Science (R0)