ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale

Mueller, Frank; Wu, Xing; Schulz, Martin; de Supinski, Bronis R.; Gamblin, Todd

doi:10.1007/978-3-642-28145-7_40

Frank Mueller¹⁶,
Xing Wu¹⁶,
Martin Schulz¹⁷,
Bronis R. de Supinski¹⁷ &
…
Todd Gamblin¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7134))

Included in the following conference series:

International Workshop on Applied Parallel Computing

1778 Accesses
4 Citations

Abstract

Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and their long execution times. An alternative to running actual codes is to gather their communication traces and then replay them, which facilitates application tuning and future procurements. While past approaches lacked lossless scalable trace collection, we contribute an approach that provides orders of magnitude smaller, if not near constant-size, communication traces regardless of the number of nodes while preserving structural information. We introduce intra- and inter-node compression techniques of MPI events, we develop a scheme to preserve time and causality of communication events, and we present results of our implementation for BlueGene/L. Given this novel capability, we discuss its impact on communication tuning and on trace extrapolation. To the best of our knowledge, such a concise representation of MPI traces in a scalable manner combined with time-preserving deterministic MPI call replay are without any precedence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bell, R., John, L.: Improved automatic testcase synthesis for performance model validation. In: International Conference on Supercomputing, pp. 111–120 (June 2005)
Google Scholar
Brunst, H., Hoppe, H.-C., Nagel, W.E., Winkler, M.: Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS 2001. LNCS, vol. 2074, pp. 751–760. Springer, Heidelberg (2001)
Chapter Google Scholar
Havlak, P., Kennedy, K.: An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems 2(3), 350–360 (1991)
Article Google Scholar
Kerbyson, D., Alme, H., Hoisie, A., Petrini, F., Wasserman, H., Gittings, M.: Predictive performance and scalability modeling of a large-scale application. In: Supercomputing (November 2001)
Google Scholar
Marathe, J., Mueller, F., Mohan, T., de Supinski, B.R., McKee, S.A., Yoo, A.: METRIC: Tracking down inefficiencies in the memory hierarchy via binary rewriting. In: International Symposium on Code Generation and Optimization, pp. 289–300 (March 2003)
Google Scholar
Marin, G., Mellor-Crummey, J.: Cross architecture performance predictions for scientific applications using parameterized models. In: SIGMETRICS Conference on Measurement and Modeling of Computer Systems (2004)
Google Scholar
Noeth, M., Mueller, F., Schulz, M., de Supinski, B.R.: Scalable compression and replay of communication traces in massively parallel environments. In: International Parallel and Distributed Processing Symposium (April 2007)
Google Scholar
Noeth, M., Mueller, F., Schulz, M., de Supinski, B.R.: Scalatrace: Scalable compression and replay of communication traces in high performance computing. Journal of Parallel Distributed Computing 69(8), 710–969 (2009)
Article Google Scholar
Ratn, P., Mueller, F., de Supinski, B.R., Schulz, M.: Preserving time in large-scale communication traces. In: International Conference on Supercomputing, pp. 46–55 (June 2008)
Google Scholar
Vetter, J., McCracken, M.: Statistical scalability analysis of communication operations in distributed applications. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2001)
Google Scholar
Vijayakumar, K., Mueller, F., Ma, X., Roth, P.C.: Scalable multi-level i/o tracing and analysis. In: Petascale Data Storage Workshop (November 2009)
Google Scholar
Wong, F., Martin, R., Arpaci-Dusseau, R., Culler, D.: Architectural requirements and scalability of the NAS parallel benchmarks. In: Supercomputing (1999)
Google Scholar
Wu, X., Mueller, F.: Scalaextrap: trace-based communication extrapolation for spmd program. In: PPoPP (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, North Carolina State University, Raleigh, NC, 27695-7534, USA
Frank Mueller & Xing Wu
Lawrence Livermore National Laboratory, Center for Applied Scientific Computing, Livermore, CA, 94551, USA
Martin Schulz, Bronis R. de Supinski & Todd Gamblin

Authors

Frank Mueller
View author publications
You can also search for this author in PubMed Google Scholar
Xing Wu
View author publications
You can also search for this author in PubMed Google Scholar
Martin Schulz
View author publications
You can also search for this author in PubMed Google Scholar
Bronis R. de Supinski
View author publications
You can also search for this author in PubMed Google Scholar
Todd Gamblin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Kristján Jónasson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mueller, F., Wu, X., Schulz, M., de Supinski, B.R., Gamblin, T. (2012). ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale. In: Jónasson, K. (eds) Applied Parallel and Scientific Computing. PARA 2010. Lecture Notes in Computer Science, vol 7134. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28145-7_40

Download citation

DOI: https://doi.org/10.1007/978-3-642-28145-7_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28144-0
Online ISBN: 978-3-642-28145-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics