Automatic Transformation for Overlapping Communication and Computation

Hu, Changjun; Shao, Yewei; Wang, Jue; Li, Jianjiang

doi:10.1007/978-3-540-88140-7_19

Changjun Hu³,
Yewei Shao³,
Jue Wang³ &
…
Jianjiang Li³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5245))

Included in the following conference series:

IFIP International Conference on Network and Parallel Computing

813 Accesses
3 Citations

Abstract

Message-passing is a predominant programming paradigm for distributed memory systems. RDMA networks like infiniBand and Myrinet reduce communication overhead by overlapping communication with computation. For the overlap to be more effective, we propose a source-to-source transformation scheme by automatically restructuring message-passing codes. The extensions to control-flow graph can accurately analyze the message-passing program and help perform data-flow analysis effectively. This analysis identifies the minimal region between producer and consumer, which contains message-passing functional calls. Using inter-procedural data-flow analysis, the transformation scheme enables the overlap of communication with computation. Experiments on the well-known NAS Parallel Benchmarks show that for distributed memory systems, versions employing communication-computation overlap are faster than original programs.

Download to read the full chapter text

Chapter PDF

Enabling Support for Zero Copy Semantics in an Asynchronous Task-Based Programming Model

Native Handling of Message-Passing Communication in Data-Flow Analysis

HitFlow: A Dataflow Programming Model for Hybrid Distributed- and Shared-Memory Systems

Article 15 February 2018

Javier Fresno, Daniel Barba, … Diego R. Llanos

Keywords

References

Basumallik, A., Eigenmann, R.: Optimizing Irregular Shared-Memory Applications for Distributed-Memory Systems, PPOPP, New York, USA, March 29-31 (2006)
Google Scholar
Fishgold, L., Danalis, A., Pollock, L., Swany, M.: An automated approach to improve communication-computation overlap in cluster. NIC Series, vol. 33, pp. 481–488. John von Neumann Institute for computing, Julich (2006)
Google Scholar
Danalis, A., Pollock, L., Swany, M.: Automatic MPI application transformation with ASPhALT. IEEE, Los Alamitos (2007)
Book Google Scholar
Danalis, A., Kim, K.-Y., Pollock, L., Swany, M.: Transformations to Parallel Codes for Communication-computation Overlap. ACM, New York (2005)
Book Google Scholar
Kreaseck, B., Carter, L., Casanova, H., Ferrante, J.: On the Interference of Communication on Computation in Java. IEEE, Los Alamitos (2004)
Book Google Scholar
El-Ghazawi, T.A., Carlson, W.W., Draper, J.M.: UPC specification, v. 1.1 (2003), http://upc.gwu.edu/documentation
Hilfinger, P., Bonachea, D., Gay, D., Graham, S., Liblit, B., Pike, G., Yelick, K.: Titanium language reference manual. tech report ucb/csd-01-1163, u.c. berkeley (November 2001)
Google Scholar
Numrich, R.W., Reid, J.K.: Co-Array Fortran for parallel programming. ACM FortranForum 17(2), 1–31 (1998)
Article Google Scholar
Goumas, G., Sotiropoulos, A., Koziris, N.: Minimizing completion time for loop tiling with computation and communication overlapping. In: Proceedings of the 15th International Parallel and Distributed Processing Symposium (IPDPS 2001), April 23–27, 2001, p. 39. IEEE Computer Society, Los Alamitos (2001)
Google Scholar
Gupta, S.K.S., Huang, C.-H., Sadayappan, P., Johnson, R.W.: Atechnique for overlapping computation and communication for block recursive algorithms. Concurrency: Practiceand Experience 10(2), 73–90 (1998)
Article MATH Google Scholar
Sohn, A., Biswas, R.: Communication studies of dmp and smp machines. Technical Report NAS-97-005,NASA Ames ResearchCenter (March 1997)
Google Scholar
Shires, D., Pollock, L., Sprenkle, S.: Program Flow Graph Construction for Static Analysis of MPI programs. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 1999) (June 1999)
Google Scholar
Hoefler, T., Lumsdaine, A., Rehm, W.: Implementation and Performance Analysis of Non-Blocking Collective operations for MPI. In: SC 2007, Reno, Nevada, USA, November 10-16 (2007)
Google Scholar
Hoefler, T., Lumsdaine, A.: Optimizing non-blocking collective operations for infiband (April 2008); Accepted for publication at the CAC 2008 in conjunction with the IDPDS 2008
Google Scholar
http://mvapich.cse.ohio-state.edu
http://www.nas.nasa.gov/software/NPB
Bailey, D., Barszcz, E., Barton, J., Browning, D., Carter, R., Dagum, L., Fatoohi, R., Fineberg, S., Frederickson, P., Lasinski, T., Schreiber, R., Simon, H., Venkatakrishnan, V., Weeratunga, S.: The NAS parallel benchmarks, Tech. Rep. RNR-94-007, NASA Ames
Google Scholar
Gupta, M., Miskiff, S., Schonberg, E., Seshadri, V., Shields, D., Wang, K., Ching, W., Ngo, T.: An HPF compiler for the IBM SP2. In: Proceedings of Supercomputing 1995, San Diego, CA (1995)
Google Scholar
Ishizaki, K., Komatsu, H., Nakatani, T.: A loop transformation algorithm for communication overlapping. International Journal of Parallel Programming 28(2), 135–154 (2000)
Article Google Scholar
Tseng, E.H.Y., Gaudiot, J.L.: Communication generation for aligned and Cyclic(k) distributions using integer lattice. IEEE Transactions on Parallel Distributed Systems 10(2), 136–146 (1999)
Article Google Scholar
Lancu, C., Husbands, P., Chen, W.: Message Strip Mining Heuristics for High Speed Networks. In: VECPAR (2004)
Google Scholar
Bell, C., Bonachea, D., Nishtala, R., Yelich, K.: Optimizing Bandwidth Limited Problems Using One-Side communication and overlap. In: 20th International parallel & Distributed Processing Symposium (IPDPS) (2006)
Google Scholar
Kennedy, K., Sethi, A.: A Communication Placement Framework with Unified Dependence and Data-flow Analysis. In: Proceeding 3rd International Conference on High Performance Computing, December 19-22, 1996, pp. 201–208 (1996)
Google Scholar
Hu, C., Yao, G., Wang, J., Li, J.: OpenMP Extensions for Irregular Parallel Applications on Cluster. In: Chapman, B.M., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Engineering, University of Science and Technology Beijing, NO.30 Xueyuan Road, Haidian District, Beijing, P.R. China
Changjun Hu, Yewei Shao, Jue Wang & Jianjiang Li

Authors

Changjun Hu
View author publications
You can also search for this author in PubMed Google Scholar
Yewei Shao
View author publications
You can also search for this author in PubMed Google Scholar
Jue Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianjiang Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Shanghai Jiatong University, 80 Dongcuan Road, 200240, Shanghai, China
Jian Cao , Minglu Li & Min-You Wu , &
Centre for Complex Software Systems and Services, Faculty of Information & Communication Technologies, Swinburne University of Technology, 1, Alfred Street, Hawthorn, 3122, Melbourne, Victoria, Australia
Jinjun Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hu, C., Shao, Y., Wang, J., Li, J. (2008). Automatic Transformation for Overlapping Communication and Computation. In: Cao, J., Li, M., Wu, MY., Chen, J. (eds) Network and Parallel Computing. NPC 2008. Lecture Notes in Computer Science, vol 5245. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88140-7_19

Download citation

DOI: https://doi.org/10.1007/978-3-540-88140-7_19
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88139-1
Online ISBN: 978-3-540-88140-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Automatic Transformation for Overlapping Communication and Computation

Abstract

Chapter PDF

Similar content being viewed by others

Enabling Support for Zero Copy Semantics in an Asynchronous Task-Based Programming Model

Native Handling of Message-Passing Communication in Data-Flow Analysis

HitFlow: A Dataflow Programming Model for Hybrid Distributed- and Shared-Memory Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Automatic Transformation for Overlapping Communication and Computation

Abstract

Chapter PDF

Similar content being viewed by others

Enabling Support for Zero Copy Semantics in an Asynchronous Task-Based Programming Model

Native Handling of Message-Passing Communication in Data-Flow Analysis

HitFlow: A Dataflow Programming Model for Hybrid Distributed- and Shared-Memory Systems

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation