Abstract
Identifying performance bottlenecks in applications is crucial to improve their efficiency, but it might be difficult to precisely assess their impact on performance: in particular, two performance problems can interact making it difficult to isolate and therefore to correct them. We propose PAMDA, a methodology to single out performance problems through hierarchical bottlenecks detection. Important potential performance issues are classified in a ‘Performance Breakdown Tree’ which is used to drive our iterative analysis cycle, prioritizing the most relevant problems. Our system relies on MAQAO toolset and code’s differential analysis. While MAQAO is a performance analysis and optimization tool suite, the differential analysis approach, which is implemented through DECAN tool, consists in quantifying performance changes when applying controlled transformations to the target code. Our focus will be on performance issues raised by processors and memory sub-systems in multicore architectures. We will demonstrate the approach on loops extracted from real life HPC applications.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Acumem: Acumem threadspotter. http://www.roguewave.com/products/threadspotter.aspx
Adhianto, L., Banerjee, S., Fagan, M., Krentel, M., Marin, G., Mellor-Crummey, J., Tallent, N.R.: HPCToolkit: tools for performance analysis of optimized parallel programs. http://hpctoolkit.org. Concurr. Comput. Pract. Exp. 22(6), 685–701 (2010). http://dx.doi.org/10.1002/cpe.v22:6
Alam, S.R., Barrett, R.F., Kuehn, J.A., Roth, P.C., Vetter, J.S.: Characterization of scientific workloads on systems with multi-core processors. In: IISWC, San Jose, pp. 225–236 (2006)
Barthou, D., Rubial, A.C., Jalby, W., Koliai, S., Valensi, C.: Performance tuning of x86 OpenMP codes with MAQAO. In: Parallel Tools Workshop, Dresden. Springer (2009)
Baysal, E., Kosloff, D., Sherwood, J.: Reverse time migration. Geophysics 48, 1514–1524 (1983)
Beyler, J.C., Triquenaux, N., Palomares, V., Chabane, F., Fighiera, T., Halimi, J.P., Jalby, W.: MicroTools: automating program generation and performance measurement. In: ICPPW, Pittsburgh, pp. 424–433. IEEE (2012)
Burtscher, M., Kim, B.D., Diamond, J.R., McCalpin, J.D., Koesterke, L., Browne, J.C.: PerfExpert: an easy-to-use performance diagnosis tool for HPC applications. In: SC, New Orleans, pp. 1–11. IEEE (2010)
Charif-Rubial, A.S.: On code performance analysis and optimisation for multicore architectures. Ph.D. thesis (2012). http://tel.archives-ouvertes.fr/tel-00842601
Charif-Rubial, A.S., Barthou, D., Valensi, C., Shende, S.S., Malony, A.D., William Jalby, I.P.: MIL: a language to build program analysis tools through static binary instrumentation. In: HiPC’13, Hyderabad (2013)
Geimer, M., Wolf, F., Wylie, B.J.N., Ábrahàm, E., Becker, D., Mohr, B.: The SCALASCA performance toolset architecture. In: STHEC, Kos, Greece (2008)
Gprof: The GNU profiler. http://sourceware.org/binutils/docs-2.18/gprof/index.html (2013)
Intel: Intel Vtune Amplifier XE. www.intel.com/software/products/vtune (2013)
Koliaï, S., Bendifallah, Z., Tribalat, M., Valensi, C., Acquaviva, J.T., Jalby, W.: Quantifying performance bottleneck cost through differential analysis. In: 27th ICS, Eugene, pp. 263–272. ACM, New York (2013). http://doi.acm.org/10.1145/2464996.2465440
Koliai, S., Zuckerman, S., Oseret, E., Ivascot, M., Moseley, T., Quang, D., Jalby, W.: A balanced approach to application performance tuning. In: LCPC, Newark, pp. 111–125 (2009)
Levon, J., Elie, P.: OProfile: a system profiler for Linux. http://oprofile.sourceforge.net (2013)
Liu, J., Yu, W., Wu, J., Buntinas, D., Kini, S., K, D., Wyckoff, P.: Microbenchmark performance comparison of high-speed cluster interconnects. IEEE Micro 24, 42–51 (2004)
MAQAO: Maqao project. http://www.maqao.org (2013)
Martonosi, M., Gupta, A., Anderson, T.: MemSpy: analyzing memory system bottlenecks in programs. In: Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, Newport, pp. 1–12 (1992)
Nagel, W.E., Arnold, A., Weber, M., Hoppe, H.C., Solchenbach, K.: VAMPIR: visualization and analysis of MPI resources. Supercomputer 12, 69–80 (1996)
Real, F., Trumm, M., Vallet, V., Schimmelpfennig, B., Masella, M., Flament, J.P.: Quantum chemical and molecular dynamics study of the coordination of Th(IV) in aqueous solvent. J. Phys. Chem. B 114(48), 15913–15924 (2010). http://dx.doi.org/10.1021/jp108061s
Shende, S.S., Malony, A.D.: The TAU parallel performance system. Int. J. High Perform. Comput. Appl. 20(2), 287–311 (2006). http://dx.doi.org/10.1177/1094342006064482
Sopeju, O., Burtscher, M., Rane, A., Browne, J.: AutoSCOPE: Automatic suggestions for code optimizations using PerfExpert. In: 2011 ICPDPTA, Las Vegas, Nevada, USA pp. 19–25 (2011)
Staelin, C.: lmbench: portable tools for performance analysis. In: USENIX Annual Technical Conference, San Diego, pp. 279–294 (1996)
Yoo, W., Larson, K., Kim, S., Ahn, W., Campbell, R.H., Baugh, L.: Automated fingerprinting of performance pathologies using performance monitoring units (PMUs). In: 3rd USENIX Workshop on Hot Topics in Parallelism (HotPar’11), Berkeley, USENIX (2011)
Yoo, W., Larson, K., Baugh, L., Kim, S., Campbell, R.H.: ADP: automated diagnosis of performance pathologies using hardware events. In: Harrison, P.G., Arlitt, M.F., Casale, G. (eds.) SIGMETRICS, London, pp. 283–294. ACM (2012). http://dblp.uni-trier.de/db/conf/sigmetrics/sigmetrics2012.html#YooLBKC12
Acknowledgements
We would like to thank Michel Masella for the access to his POLARIS(MD) code and Henri Calandra and Asma Farjallah for the access to the RTM code.
This work has been carried out by the Exascale Computing Research laboratory, thanks to the support of CEA, GENCI, Intel, UVSQ, and by the PRiSM laboratory, thanks to the support of the French Ministry for Economy, Industry, and Employment throught the PERFCLOUD project. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the CEA, GENCI, Intel, or UVSQ.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Bendifallah, Z., Jalby, W., Noudohouenou, J., Oseret, E., Palomares, V., Rubial, A.C. (2014). PAMDA: Performance Assessment Using MAQAO Toolset and Differential Analysis. In: Knüpfer, A., Gracia, J., Nagel, W., Resch, M. (eds) Tools for High Performance Computing 2013. Springer, Cham. https://doi.org/10.1007/978-3-319-08144-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-08144-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08143-4
Online ISBN: 978-3-319-08144-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)