Abstract
Data is often communicated from different locations in application memory and is commonly serialized (copied) to send buffers or from receive buffers. MPI datatypes are a way to avoid such intermediate copies and optimize communications, however, it is often unclear which implementation and optimization choices are most useful in practice. We extracted the send/recv-buffer access pattern of a representative set of scientific applications into micro-applications that isolate their data access patterns. We also observed that the buffer-access patterns in applications can be categorized into three different groups. Our micro-applications show that up to 90% of the total communication time can be spent with local serialization and we found significant performance discrepancies between state-of-the-art MPI implementations. Our micro-applications aim to provide a standard benchmark for MPI datatype implementations to guide optimizations similarly to SPEC CPU and the Livermore loops do for compiler optimizations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aiken, A., Nicolau, A.: Optimal loop parallelization. SIGPLAN Not. 23(7), 308–317 (1988), http://doi.acm.org/10.1145/960116.54021
Alverson, R., Roweth, D., Kaplan, L.: The Gemini System Interconnect. In: 18th IEEE Symp. on High Performance Interconnects, pp. 83–87 (2010)
Barrett, R.F., Heroux, M.A., et al.: Poster: mini-applications: vehicles for co-design. In: Proceedings of the 2011 Companion on High Performance Computing Networking, Storage and Analysis Companion, SC 2011 Companion, pp. 1–2. ACM (2011)
Bernard, C., Ogilvie, M.C., DeGrand, T.A., et al.: Studying quarks and gluons on MIMD parallel computers. High Performance Computing Applications (1991)
Brunner, T.A.: Mulard: A multigroup thermal radiation diffusion mini-application. Tech. rep., DOE Exascale Research Conference (2012)
Byna, S., Gropp, W., Sun, X.H., Thakur, R.: Improving the performance of MPI derived datatypes by optimizing memory-access cost. In: Cluster Computing (2003)
Carrington, L., Komatitsch, D., et al.: High-frequency simulations of global seismic wave propagation using SPECFEM3D_GLOBE on 62K processors. In: ACM/IEEE Conference on Supercomputing (2008)
Dixit, K.M.: The SPEC benchmarks. Parallel Computing 17 (1991)
Gropp, W., Hoefler, T., Thakur, R., Träff, J.L.: Performance Expectations and Guidelines for MPI Derived Datatypes. In: Cotronis, Y., Danalis, A., Nikolopoulos, D.S., Dongarra, J. (eds.) EuroMPI 2011. LNCS, vol. 6960, pp. 150–159. Springer, Heidelberg (2011)
Heroux, M.A., Doerfler, D.W., et al.: Improving performance via mini-applications. Tech. rep., Sandia National Laboratories, SAND 2009-5574 (2009)
Hoefler, T., Gottlieb, S.: Parallel Zero-Copy Algorithms for Fast Fourier Transform and Conjugate Gradient Using MPI Datatypes. In: Keller, R., Gabriel, E., Resch, M., Dongarra, J. (eds.) EuroMPI 2010. LNCS, vol. 6305, pp. 132–141. Springer, Heidelberg (2010)
Lu, Q., Wu, J., Panda, D., Sadayappan, P.: Applying MPI derived datatypes to the NAS benchmarks: A case study. In: Intl. Conf. on Parallel Processing (2004)
McMahon, F.H.: The Livermore Fortran Kernels: A computer test of the numerical performance range. Tech. rep., Lawrence Livermore National Laboratory, UCRL-53745 (1986)
MPI Forum: MPI: A Message-Passing Interface Standard. Version 2.2 (2009)
Plimpton, S.: Fast parallel algorithms for short-range molecular dynamics. Computational Physics 117(1) (1995)
Reussner, R., Träff, J.L., Hunzelmann, G.: A Benchmark for MPI Derived Datatypes. In: Dongarra, J., Kacsuk, P., Podhorszki, N. (eds.) PVM/MPI 2000. LNCS, vol. 1908, pp. 10–17. Springer, Heidelberg (2000)
Skamarock, W.C., Klemp, J.B.: A time-split nonhydrostatic atmospheric model for weather research and forecasting applications. J. Comput. Phys. 227(7), 3465–3485 (2008), http://dx.doi.org/10.1016/j.jcp.2007.01.037
Träff, J.L., Hempel, R., Ritzdorf, H., Zimmermann, F.: Flattening on the Fly: Efficient Handling of MPI Derived Datatypes. In: Margalef, T., Dongarra, J., Luque, E. (eds.) PVM/MPI 1999. LNCS, vol. 1697, pp. 109–116. Springer, Heidelberg (1999)
der Wijngaart, R.F.V., Wong, P.: NAS parallel benchmarks version 2.4. Tech. rep., NAS Technical Report NAS-02-007 (2002)
Wu, J., Wyckoff, P., Panda, D.: High performance implementation of MPI derived datatype communication over infiniband. In: Parallel and Distributed Processing Symposium (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schneider, T., Gerstenberger, R., Hoefler, T. (2012). Micro-applications for Communication Data Access Patterns and MPI Datatypes. In: Träff, J.L., Benkner, S., Dongarra, J.J. (eds) Recent Advances in the Message Passing Interface. EuroMPI 2012. Lecture Notes in Computer Science, vol 7490. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33518-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-33518-1_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33517-4
Online ISBN: 978-3-642-33518-1
eBook Packages: Computer ScienceComputer Science (R0)