Skip to main content

Scalable Detection of MPI-2 Remote Memory Access Inefficiency Patterns

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 5759))

Abstract

Wait states in parallel applications can be identified by scanning event traces for characteristic patterns. In our earlier work, we have defined such patterns for mpi-2 one-sided communication, although still based on a trace-analysis scheme with limited scalability. Taking advantage of a new scalable trace-analysis approach based on a parallel replay, which was originally developed for mpi-1 point-to-point and collective communication, we show how wait states in one-sided communications can be detected in a more scalable fashion. We demonstrate the scalability of our method and its usefulness for the optimization cycle with applications running on up to 8,192 cores.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, Version 2.1 (June 2008), http://www.mpi-forum.org/

  2. Mirin, A.A., Sawyer, W.B.: A scalable implementation of a finite-volume dynamical core in the community atmosphere model. International Journal on High Performance Computing Applications 19(3), 203–212 (2005)

    Article  Google Scholar 

  3. Kühnal, A., Hermanns, M.-A., Mohr, B., Wolf, F.: Specification of inefficiency patterns for MPI-2 one-sided communication. In: Nagel, W.E., Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 47–62. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Geimer, M., Wolf, F., Wylie, B.J.N., Mohr, B.: Scalable parallel trace-based performance analysis. In: Mohr, B., Träff, J.L., Worringen, J., Dongarra, J. (eds.) PVM/MPI 2006. LNCS, vol. 4192, pp. 303–312. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Scalasca, http://www.scalasca.org/

  6. Mohror, K., Karavanic, K.L.: Performance tool support for MPI-2 on Linux. In: Proceedings of the Supercomputing Conference (SC), Pittsburgh, PA (2004)

    Google Scholar 

  7. Shende, S.S., Malony, A.D.: The TAU parallel performance system. International Journal of High Performance Computing Applications 20(2), 287–331 (2006)

    Article  Google Scholar 

  8. Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir performance analysis tool set. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 139–155. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  9. Knüpfer, A.: Personal communication (2009)

    Google Scholar 

  10. Wolf, F., Mohr, B.: Automatic performance analysis of hybrid MPI/OpenMP applications. Journal of Systems Architecture 49(10-11), 421–439 (2003)

    Article  Google Scholar 

  11. Leko, A., Su, H.H., Bonachea, D., Golden, B., Billingsley, M., George, A.: Parallel Performance Wizard: A performance analysis tool for partitioned global-address-space programming models. In: Proc. of the Supercomputing Conference (SC), vol. 186. ACM, New York (2006)

    Google Scholar 

  12. Becker, D., Rabenseifner, R., Wolf, F., Linford, J.: Replay-based synchronization of timestamps in event traces of massively parallel applications. Scalable Computing: Practice and Experience 10(1), 49–60 (2009); Special Issue International Workshop on Simulation and Modelling in Emergent Computational Systems (SMECS)

    Google Scholar 

  13. Geimer, M., Wolf, F., Wylie, B.J., Mohr, B.: A scalable tool architecture for diagnosing wait states in massively parallel applications. Parallel Computing (in press) (2009)

    Google Scholar 

  14. Bailey, D.H., Barzcz, E., Dagum, L., Simon, H.D.: NAS parallel benchmark results. IEEE Parallel Distrib. Technol. 1(1), 43–51 (1993)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hermanns, MA., Geimer, M., Mohr, B., Wolf, F. (2009). Scalable Detection of MPI-2 Remote Memory Access Inefficiency Patterns. In: Ropo, M., Westerholm, J., Dongarra, J. (eds) Recent Advances in Parallel Virtual Machine and Message Passing Interface. EuroPVM/MPI 2009. Lecture Notes in Computer Science, vol 5759. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03770-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03770-2_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03769-6

  • Online ISBN: 978-3-642-03770-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics