Send-Receive Considered Harmful: Toward Structured Parallel Programming

  • Sergei GorlatchEmail author


During the software crisis of the 1960s, Dijkstra’s famous thesis “goto considered harmful’’ paved the way for structured programming of sequential computers. This short communication suggests that many current difficulties and challenges of parallel programming based on message passing are caused by poorly structured, pair-wise communication, which is a consequence of using low-level send-receive primitives. We argue that, like goto in sequential programs, send-receive should be avoided as far as possible. A viable alternative in the setting of message passing are collective operations, already present in MPI (Message Passing Interface). We dispute some widely held opinions about the apparent superiority of unstructured pair-wise communication over well-structured collective operations, and we present substantial theoretical and empirical evidence to the contrary in the context of the MPI framework.


Parallel programming Programming methodology Application performance Message passing interface (MPI) 



I am grateful to many colleagues in the field of parallel computing, whose research provided necessary theoretical and experimental evidence to support the ideas presented here. It is my pleasure to acknowledge the very helpful comments of Chris Lengauer, Robert van de Geijn, Murray Cole, Jan Prins, Thilo Kielmann, Holger Bischof, and Phil Bacon on the preliminary version of the manuscript.


  1. Bernashi, M., Iannello, G., & Lauria, M. (1999). Experimental results about MPI collective communication operations. High-Performance Computing and Networking. Lecture Notes in Computer Science, 775–783.Google Scholar
  2. Bilardi, G., Herley, K. T., Pietracaprina, A., Pucci, G., & Spirakis, P. (1996). BSP vs LogP. Proceedings of the Eighth Annual ACM Symposium on Parallel Algorithms and Architectures (pp. 25–32).Google Scholar
  3. Böhm, C., & Jacopini, G. (1966). Flow diagrams, Turing machines and languages with only two formation rules. Communications of the ACM, 9, 366–371.CrossRefGoogle Scholar
  4. Dahl, O., Dijkstra, E. W., & Hoare, C. (1975). Structured Programming: Academic Press.Google Scholar
  5. Dijkstra, E. W. (1968). Go to statement considered harmful. Communications of the ACM, 11(3), 147–148.CrossRefGoogle Scholar
  6. Fischer, J., & Gorlatch, S. (2002). Turing universality of morphisms for parallel programming. Parallel Processing Letters, 12(2), 229–246.CrossRefGoogle Scholar
  7. Gorlatch, S. (2000). Towards formally-based design of message passing programs. TSE, 26(3), 276–288.Google Scholar
  8. Gorlatch, S. (2001). Send-Recv considered harmful? Myths and truths about parallel programming. In Parallel Computing Technologies (pp. 243–258).Google Scholar
  9. Gorlatch, S., & Cole, M. (2011). Parallel skeletons, encyclopedia of parallel computing. Boston: Springer.Google Scholar
  10. Gorlatch, S., Wedler, C., & Lengauer, C. (1999). Optimization rules for programming with collective operations. In Proceedings of IPPS’99 (pp. 492–499).Google Scholar
  11. Goudreau, M., Lang, K., Rao, S., Suel, T., & Tsantilas, T. (n.d.). Towards efficiency and portablility: Programming with the BSP model (pp. 1–12).Google Scholar
  12. Goudreau, M., & Rao, S. (1999). Single-message vs.\batch communication. In Algorithms for parallel processing (pp. 61–74).CrossRefGoogle Scholar
  13. Gropp, W., Lusk, E., & Skjellum, A. (1994). Using MPI: MIT Press. Scientific and Engineering Computation Series.Google Scholar
  14. Hagedorn, B., Steuwer, M., & Gorlatch, S. (2018). A transformation-based approach to developing high-performance GPU programs. In 11th International A.Ershov Memorial Conference on Perspectives of System Informatics, PSI 2017. Springer.Google Scholar
  15. Hagedorn, B., Stoltzfus, L., Steuwer, M., Gorlatch, S., & Dubach, C. (2018). High performance stencil code generation with Lift. In Proceedings of the 2018 International Symposium on Code Generation and Optimization-CGO 2018.Google Scholar
  16. Hwang, K., & Xu, Z. (1998). Scalable parallel computing. McGraw Hill.Google Scholar
  17. Kielmann, T., Bal, H., & Gorlatch, S. (2000). Bandwidth-efficient collective communication for clustered wide area systems. In Parallel and Distributed Processing Symposium (pp. 492–499).Google Scholar
  18. Kielmann, T., Hofman, R. F. H., Bal, H. E., Plaat, A., & Bhoedjang, R. a F. (1999). MagPIe: MPI’s collective communication operations for clustered wide area systems. In PPOPP’99 (pp. 131–140).CrossRefGoogle Scholar
  19. Kumar, V. (1994). Introduction to parallel computing. Benjamin Cummings.Google Scholar
  20. Pacheco, P. S. (1997). Parallel programming with MPI. Morgan Kaufmann.Google Scholar
  21. Park, J., Choi, H., Nupairoj, N., & Ni, L. (1996). Construction of optimal multicast trees based on the parameterized communication model (pp. 180–187).Google Scholar
  22. Schneider, F. (1997). On concurrent programming. Springer.Google Scholar
  23. Steuwer, M., & Gorlatch, S. (2014). SkelCL: a high-level extension of OpenCL for multi-GPU systems. The Journal of Supercomputing, 69(1), 25–33.CrossRefGoogle Scholar
  24. Vadhiyar, S. S., Fagg, G. E., & Dongarra, J. (2000). Automatically tuned collective communications. In 2000 ACM/IEEE Conference on Supercomputing.Google Scholar
  25. Valiant, L. (1990). General purpose parallel architectures. Handbook of Theoretical Computer Science, 943–971.Google Scholar
  26. van de Geijn, R. (1997). Using PLAPACK: Parallel linear algebra package. MIT Press (Scientific and Engineering Computation Series).Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of MünsterMünsterGermany

Personalised recommendations