Scalable Detection of MPI-2 Remote Memory Access Inefficiency Patterns

Hermanns, Marc-André; Geimer, Markus; Mohr, Bernd; Wolf, Felix

doi:10.1007/978-3-642-03770-2_10

Scalable Detection of MPI-2 Remote Memory Access Inefficiency Patterns

Marc-André Hermanns¹⁸,
Markus Geimer¹⁸,
Bernd Mohr¹⁸ &
…
Felix Wolf^18,19

Conference paper

1123 Accesses
6 Citations

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5759))

Abstract

Wait states in parallel applications can be identified by scanning event traces for characteristic patterns. In our earlier work, we have defined such patterns for mpi-2 one-sided communication, although still based on a trace-analysis scheme with limited scalability. Taking advantage of a new scalable trace-analysis approach based on a parallel replay, which was originally developed for mpi-1 point-to-point and collective communication, we show how wait states in one-sided communications can be detected in a more scalable fashion. We demonstrate the scalability of our method and its usefulness for the optimization cycle with applications running on up to 8,192 cores.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 2.1 (June 2008), http://www.mpi-forum.org/
Mirin, A.A., Sawyer, W.B.: A scalable implementation of a finite-volume dynamical core in the community atmosphere model. International Journal on High Performance Computing Applications 19(3), 203–212 (2005)
Article Google Scholar
Kühnal, A., Hermanns, M.-A., Mohr, B., Wolf, F.: Specification of inefficiency patterns for MPI-2 one-sided communication. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 47–62. Springer, Heidelberg (2006)
Chapter Google Scholar
Geimer, M., Wolf, F., Wylie, B.J.N., Mohr, B.: Scalable parallel trace-based performance analysis. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, vol. 4192, pp. 303–312. Springer, Heidelberg (2006)
Chapter Google Scholar
Scalasca, http://www.scalasca.org/
Mohror, K., Karavanic, K.L.: Performance tool support for MPI-2 on Linux. In: Proceedings of the Supercomputing Conference (SC), Pittsburgh, PA (2004)
Google Scholar
Shende, S.S., Malony, A.D.: The TAU parallel performance system. International Journal of High Performance Computing Applications 20(2), 287–331 (2006)
Article Google Scholar
Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir performance analysis tool set. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 139–155. Springer, Heidelberg (2008)
Chapter Google Scholar
Knüpfer, A.: Personal communication (2009)
Google Scholar
Wolf, F., Mohr, B.: Automatic performance analysis of hybrid MPI/OpenMP applications. Journal of Systems Architecture 49(10-11), 421–439 (2003)
Article Google Scholar
Leko, A., Su, H.H., Bonachea, D., Golden, B., Billingsley, M., George, A.: Parallel Performance Wizard: A performance analysis tool for partitioned global-address-space programming models. In: Proc. of the Supercomputing Conference (SC), vol. 186. ACM, New York (2006)
Google Scholar
Becker, D., Rabenseifner, R., Wolf, F., Linford, J.: Replay-based synchronization of timestamps in event traces of massively parallel applications. Scalable Computing: Practice and Experience 10(1), 49–60 (2009); Special Issue International Workshop on Simulation and Modelling in Emergent Computational Systems (SMECS)
Google Scholar
Geimer, M., Wolf, F., Wylie, B.J., Mohr, B.: A scalable tool architecture for diagnosing wait states in massively parallel applications. Parallel Computing (in press) (2009)
Google Scholar
Bailey, D.H., Barzcz, E., Dagum, L., Simon, H.D.: NAS parallel benchmark results. IEEE Parallel Distrib. Technol. 1(1), 43–51 (1993)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Jülich Supercomputing Centre, Forschungszentrum Jülich, Germany
Marc-André Hermanns, Markus Geimer, Bernd Mohr & Felix Wolf
Department of Computer Science, RWTH Aachen University, Germany
Felix Wolf

Authors

Marc-André Hermanns
View author publications
You can also search for this author in PubMed Google Scholar
Markus Geimer
View author publications
You can also search for this author in PubMed Google Scholar
Bernd Mohr
View author publications
You can also search for this author in PubMed Google Scholar
Felix Wolf
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information Technology, Åbo Akademi, 20500, Turku, Finland
Matti Ropo & Jan Westerholm &
Department of Electrical Engineering and Computer Science, University of Tennessee, 37996-3450, Knoxville, TN, USA
Jack Dongarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hermanns, MA., Geimer, M., Mohr, B., Wolf, F. (2009). Scalable Detection of MPI-2 Remote Memory Access Inefficiency Patterns. In: Ropo, M., Westerholm, J., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2009. Lecture Notes in Computer Science, vol 5759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03770-2_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-03770-2_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03769-6
Online ISBN: 978-3-642-03770-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics