Abstract
Characterizing the communication behavior of large-scale applications is a difficult and costly task due to code/system complexity and their long execution times. An alternative to running actual codes is to gather their communication traces and then replay them, which facilitates application tuning and future procurements. While past approaches lacked lossless scalable trace collection, we contribute an approach that provides orders of magnitude smaller, if not near constant-size, communication traces regardless of the number of nodes while preserving structural information. We introduce intra- and inter-node compression techniques of MPI events, we develop a scheme to preserve time and causality of communication events, and we present results of our implementation for BlueGene/L. Given this novel capability, we discuss its impact on communication tuning and on trace extrapolation. To the best of our knowledge, such a concise representation of MPI traces in a scalable manner combined with time-preserving deterministic MPI call replay are without any precedence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bell, R., John, L.: Improved automatic testcase synthesis for performance model validation. In: International Conference on Supercomputing, pp. 111–120 (June 2005)
Brunst, H., Hoppe, H.-C., Nagel, W.E., Winkler, M.: Performance Optimization for Large Scale Computing: The Scalable VAMPIR Approach. In: Alexandrov, V.N., Dongarra, J., Juliano, B.A., Renner, R.S., Tan, C.J.K. (eds.) ICCS 2001. LNCS, vol. 2074, pp. 751–760. Springer, Heidelberg (2001)
Havlak, P., Kennedy, K.: An implementation of interprocedural bounded regular section analysis. IEEE Transactions on Parallel and Distributed Systems 2(3), 350–360 (1991)
Kerbyson, D., Alme, H., Hoisie, A., Petrini, F., Wasserman, H., Gittings, M.: Predictive performance and scalability modeling of a large-scale application. In: Supercomputing (November 2001)
Marathe, J., Mueller, F., Mohan, T., de Supinski, B.R., McKee, S.A., Yoo, A.: METRIC: Tracking down inefficiencies in the memory hierarchy via binary rewriting. In: International Symposium on Code Generation and Optimization, pp. 289–300 (March 2003)
Marin, G., Mellor-Crummey, J.: Cross architecture performance predictions for scientific applications using parameterized models. In: SIGMETRICS Conference on Measurement and Modeling of Computer Systems (2004)
Noeth, M., Mueller, F., Schulz, M., de Supinski, B.R.: Scalable compression and replay of communication traces in massively parallel environments. In: International Parallel and Distributed Processing Symposium (April 2007)
Noeth, M., Mueller, F., Schulz, M., de Supinski, B.R.: Scalatrace: Scalable compression and replay of communication traces in high performance computing. Journal of Parallel Distributed Computing 69(8), 710–969 (2009)
Ratn, P., Mueller, F., de Supinski, B.R., Schulz, M.: Preserving time in large-scale communication traces. In: International Conference on Supercomputing, pp. 46–55 (June 2008)
Vetter, J., McCracken, M.: Statistical scalability analysis of communication operations in distributed applications. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (2001)
Vijayakumar, K., Mueller, F., Ma, X., Roth, P.C.: Scalable multi-level i/o tracing and analysis. In: Petascale Data Storage Workshop (November 2009)
Wong, F., Martin, R., Arpaci-Dusseau, R., Culler, D.: Architectural requirements and scalability of the NAS parallel benchmarks. In: Supercomputing (1999)
Wu, X., Mueller, F.: Scalaextrap: trace-based communication extrapolation for spmd program. In: PPoPP (2011)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mueller, F., Wu, X., Schulz, M., de Supinski, B.R., Gamblin, T. (2012). ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale. In: Jónasson, K. (eds) Applied Parallel and Scientific Computing. PARA 2010. Lecture Notes in Computer Science, vol 7134. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28145-7_40
Download citation
DOI: https://doi.org/10.1007/978-3-642-28145-7_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28144-0
Online ISBN: 978-3-642-28145-7
eBook Packages: Computer ScienceComputer Science (R0)