Effective Holistic Performance Measurement at Petascale Using IPM
As supercomputers are being built from an ever increasing number of processing elements, the effort required to achieve a substantial fraction of the system peak performance is continuously growing. Tools are needed that give developers and computing center staff holistic indicators about the resource consumption of applications and potential performance pitfalls at scale. To use the full potential of a supercomputer today, applications must incorporate multilevel parallelism (threading and message passing) and carefully orchestrate file I/O. As a consequence, performance tools must also be able to monitor these system components in an integrated way and at the full machine scales. We present ipm, a modularized monitoring approach for MPI, OpenMP, file I/O, and other event sources.
KeywordsHash Table Parallel Region Performance Analysis Tool Hardware Performance Counter OpenMP Application
This work was supported by the Bavaria-California Technology Center (BaCaTec) throughout the project “Performance and Workload Characterization for Multi-Core Supercomputers” and by the NSF under award OCI-0721397. This research was also supported by an allocation of advanced computing resources provided by the National Science Foundation. The computations were performed on Kraken (a Cray XT5) at the National Institute for Computational Sciences.
- 1.Binet, S., Winklmeyer, F., Wiedenmann, W., Calafiura, P., Snyder, S.: Harnessing multicores: Strategies and implementations in ATLAS. In: Proceedings of the 17th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2009), Prague, Czech Republic (2009)Google Scholar
- 2.Using Cray performance analysis tools. http://docs.cray.com/books/S-2376-41/S-2376-41.pdf.
- 3.Fuerlinger, K., Wright, N.J., Skinner, D.: Effective performance measurement at petascale using ipm. In: Proceedings of The Sixteenth IEEE International Conference on Parallel and Distributed Systems (ICPADS 2010), Shanghai, China, December (2010)Google Scholar
- 4.Fürlinger, K., Gerndt, M. ompP: A profiling tool for OpenMP. In: Proceedings of the First International Workshop on OpenMP (IWOMP 2005), Eugene, Oregon, USA, May (2005)Google Scholar
- 5.Geimer, M., Wolf, F., Wylie, B.J.N., Mohr, B.: Scalable parallel trace-based performance analysis. In: Proceedings of the 13th European PVM/MPI Users’ Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI 2006), pp. 303–312. Bonn, Germany (2006)Google Scholar
- 6.Intel Thread Profiler http://www.intel.com/software/products/threading/tp/.
- 7.Intel Trace Analyzer http://www.intel.com/software/products/cluster/tanalyzer/.
- 8.Allen, D.M., Sameer, S.S.: Performance technology for complex parallel and distributed systems. pp. 37–46 (2000)Google Scholar
- 9.Mohr, B., Malony, A.D., Hoppe, H.-C., Schlimbach, F., Haab, G., Hoeflinger, J., Shah, S.: A performance monitoring interface for OpenMP. In: Proceedings of the Fourth Workshop on OpenMP (EWOMP 2002), Rome, Italy September (2002)Google Scholar
- 10.Mohr, B., Malony, A.D., Shende, S.S., Wolf, F.: Towards a performance tool interface for OpenMP: An approach based on directive rewriting. In: Proceedings of the Third Workshop on OpenMP (EWOMP’01), September (2001)Google Scholar
- 11.Nakhimovsky, G.: Debugging and performance tuning with library interposers, July 2001. http://developers.sun.com/solaris/articles/lib_interposers.html.
- 12.PAPI web page: http://icl.cs.utk.edu/papi/.
- 13.Roth, P.C., Arnold, D.C., Miller, B.P. MRNet: A software-based multicast/reduction network for scalable tools. In: Proceedings of the 2003 Conference on Supercomputing (SC 2003), Phoenix, Arizona, USA, November (2003)Google Scholar
- 14.Shende, S.S., Malony, A.D.: The TAU parallel performance system. International Journal of High Performance Computing Applications, ACTS Collection Special Issue (2005)Google Scholar
- 15.Skinner, D.: Integrated Performance Monitoring: A portable profiling infrastructure for parallel applications. In: Proceedings ISC2005: International Supercomputing Conference, Heidelberg, Germany (2005)Google Scholar
- 16.Szebenyi, Z., Wylie, B.J.N., Wolf, F.: Scalasca parallel performance analyses of PEPC. In: Proceedings of the Workshop on Productivity and Performance (PROPER 2008) at EuroPar 2008, Las Palmas de Gran Canaria, Spain (2008)Google Scholar
- 17.Tallent, N.R., Mellor-Crummey, J., Adhianto, L., Fagan, M.W., Krentel, M.: Diagnosing performance bottlenecks in emerging petascale applications. In: SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pages 1–11, New York, NY, USA, ACM (2009)Google Scholar
- 19.The Top 500 Supercomputer Sites, web page: http://www.top500.org.
- 20.Wright, N.J., Pfeiffer, W., Snavely, A.: Characterizing parallel scaling of scientific applications using IPM. In: The 10th LCI International Conference on High-Performance Clustered Computing, March 10–12 (2009)Google Scholar