Abstract
Developing scalable parallel applications for extreme-scale systems is challenging. The challenge of developing scalable parallel applications is only partially addressed by existing languages, compilers, and autotuners. As a result, manual performance tuning is often necessary to obtain high application performance. Rice University’s HPCToolkit is a suite of performance tools that supports innovative techniques for pinpointing and quantifying performance bottlenecks in fully optimized parallel programs with a measurement overhead of only a few percent. Many of these techniques were designed to leverage sampling for performance measurement, attribution, analysis, and presentation. This paper surveys some of HPCToolkit’s most interesting techniques and argues that sampling-based performance analysis is surprisingly versatile and effective.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCToolkit: Tools for performance analysis of optimized parallel programs. Concurr. Comput. Pract. Exp. 22(6), 685–701 (2010)
Adhianto, L., Mellor-Crummey, J., Tallent, N.R.: Effectively presenting call path profiles of application performance. In: International Conference on Parallel Processing Workshops, pp. 179–188. IEEE Computer Society, Los Alamitos (2010)
Arnold, M., Ryder, B.G.: A framework for reducing the cost of instrumented code. In: Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 168–179. ACM, New York (2001)
Chung, I.H., Walkup, R.E., Wen, H.F., Yu, H.: MPI performance analysis tools on Blue Gene/L. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, p. 123. ACM, New York (2006)
Coarfa, C., Mellor-Crummey, J., Froyd, N., Dotsenko, Y.: Scalability analysis of SPMD codes using expectations. In: Proceedings of the 21st International Conference on Supercomputing, pp. 13–22. ACM, New York (2007)
De Rose, L., Homer, B., Johnson, D., Kaufmann, S., Poxon, H.: Cray performance analysis tools. In: Tools for High Performance Computing, pp. 191–199. Springer, Berlin (2008)
Free Software Foundation: Glibc. http://www.gnu.org/s/libc/ (2012)
Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: Proceedings of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 212–223. ACM, New York (1998)
Froyd, N., Mellor-Crummey, J., Fowler, R.: Low-overhead call path profiling of unmodified, optimized code. In: Proceedings of the 19th International Conference on Supercomputing, pp. 81–90. ACM, New York (2005)
Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Mohr, B.: The Scalasca performance toolset architecture. Concurr. Comput. Pract. Exp. 22(6), 702–719 (2010)
Hollingsworth, J.K., Miller, B.P., Cargille, J.: Dynamic program instrumentation for scalable performance tools. In: Proceedings of the 1994 Scalable High Performance Computing Conference, pp. 841–850. IEEE Computer Society, Los Alamitos, CA, USA (1994)
Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir performance analysis tool-set. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.) Tools for High Performance Computing, pp. 139–155. Springer, Berlin (2008)
Liu, X., Mellor-Crummey, J.: Pinpointing data locality problems using data-centric analysis. In: Proceedings of the 2011 IEEE/ACM International Symposium on Code Generation and Optimization, Chamonix, France, pp. 171–180. IEEE Computer Society, Los Alamitos (2011)
Malony, A.D., Shende, S., Morris, A., Wolf, F.: Compensation of measurement overhead in parallel performance profiling. Int. J. High Perform. Comput. Appl. 21(2), 174–194 (2007)
Mellor-Crummey, J., Fowler, R., Marin, G., Tallent, N.: HPCView: a tool for top-down analysis of node performance. J. Supercomput. 23(1), 81–104 (2002)
Miller, B.P., Callaghan, M.D., Cargille, J.M., Hollingsworth, J.K., Irvin, R.B., Karavanic, K.L., Kunchithapadam, K., Newhall, T.: The Paradyn parallel performance measurement tool. Computer 28(11), 37–46 (1995)
Mosberger-Tang, D.: libunwind. http://www.nongnu.org/libunwind (2012)
Petrini, F., Kerbyson, D.J., Pakin, S.: The case of the missing supercomputer performance: achieving optimal performance on the 8,192 processors of ASCI Q. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, p. 55. IEEE Computer Society, Washington, DC (2003)
Rice University: HPCToolkit performance tools. http://hpctoolkit.org (2012)
Schulz, M., Galarowicz, J., Maghrak, D., Hachfeld, W., Montoya, D., Cranford, S.: Open | SpeedShop: an open source infrastructure for parallel performance analysis. Sci. Program. 16(2–3), 105–121 (2008)
Shende, S.S., Malony, A.D.: The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006)
Tallent, N.R., Mellor-Crummey, J.: Effective performance measurement and analysis of multithreaded applications. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 229–240. ACM, New York (2009)
Tallent, N.R., Mellor-Crummey, J., Fagan, M.W.: Binary analysis for measurement and attribution of program performance. In: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 441–452. ACM, New York (2009)
Tallent, N.R., Mellor-Crummey, J.M.: Identifying performance bottlenecks in work-stealing computations. Computer 42(12), 44–50 (2009)
Tallent, N., Mellor-Crummey, J., Adhianto, L., Fagan, M., Krentel, M.: HPCToolkit: performance tools for scientific computing. J. Phys. Conf. Ser. 125, 012088 (5pp) (2008)
Tallent, N.R., Mellor-Crummey, J.M., Adhianto, L., Fagan, M.W., Krentel, M.: Diagnosing performance bottlenecks in emerging petascale applications. In: Proceedings of the 2009 ACM/IEEE Conference on Supercomputing, pp. 1–11. ACM, New York (2009)
Tallent, N.R., Adhianto, L., Mellor-Crummey, J.M.: Scalable identification of load imbalance in parallel executions using call path profiles. In: Proceedings of the 2010 ACM/IEEE Conference on Supercomputing, pp. 1–11. IEEE Computer Society, Washington, DC (2010)
Tallent, N.R., Mellor-Crummey, J.M., Porterfield, A.: Analyzing lock contention in multithreaded applications. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 269–280. ACM, New York (2010)
Tallent, N.R., Mellor-Crummey, J.M., Franco, M., Landrum, R., Adhianto, L.: Scalable fine-grained call path tracing. In: Proceedings of the 25th International Conference on Supercomputing, pp. 63–74. ACM, New York (2011)
Traub, O., Schechter, S., Smith, M.D.: Ephemeral instrumentation for lightweight program profiling. Tech. rep., Harvard University (1999)
Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Frings, W., Fürlinger, K., Geimer, M., Hermanns, M.A., Mohr, B., Moore, S., Pfeifer, M., Szebenyi, Z.: Usage of the Scalasca toolset for scalable performance analysis of large-scale parallel applications. In: Tools for High Performance Computing, pp. 157–167. Springer, Berlin (2008)
Acknowledgements
HPCToolkit would not be what it is without the efforts of Mark Krentel, Laksono Adhianto, and Mike Fagan. Xu Liu developed our data-centric analysis.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Tallent, N.R., Mellor-Crummey, J. (2012). Using Sampling to Understand Parallel Program Performance. In: Brunst, H., Müller, M., Nagel, W., Resch, M. (eds) Tools for High Performance Computing 2011. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31476-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-31476-6_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31475-9
Online ISBN: 978-3-642-31476-6
eBook Packages: Computer ScienceComputer Science (R0)