Effective Holistic Performance Measurement at Petascale Using IPM

  • Karl FürlingerEmail author
  • Nicholas J. Wright
  • David Skinner
  • Christof Klausecker
  • Dieter Kranzlmüller
Conference paper


As supercomputers are being built from an ever increasing number of processing elements, the effort required to achieve a substantial fraction of the system peak performance is continuously growing. Tools are needed that give developers and computing center staff holistic indicators about the resource consumption of applications and potential performance pitfalls at scale. To use the full potential of a supercomputer today, applications must incorporate multilevel parallelism (threading and message passing) and carefully orchestrate file I/O. As a consequence, performance tools must also be able to monitor these system components in an integrated way and at the full machine scales. We present ipm, a modularized monitoring approach for MPI, OpenMP, file I/O, and other event sources.


Hash Table Parallel Region Performance Analysis Tool Hardware Performance Counter OpenMP Application 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported by the Bavaria-California Technology Center (BaCaTec) throughout the project “Performance and Workload Characterization for Multi-Core Supercomputers” and by the NSF under award OCI-0721397. This research was also supported by an allocation of advanced computing resources provided by the National Science Foundation. The computations were performed on Kraken (a Cray XT5) at the National Institute for Computational Sciences.


  1. 1.
    Binet, S., Winklmeyer, F., Wiedenmann, W., Calafiura, P., Snyder, S.: Harnessing multicores: Strategies and implementations in ATLAS. In: Proceedings of the 17th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2009), Prague, Czech Republic (2009)Google Scholar
  2. 2.
    Using Cray performance analysis tools.
  3. 3.
    Fuerlinger, K., Wright, N.J., Skinner, D.: Effective performance measurement at petascale using ipm. In: Proceedings of The Sixteenth IEEE International Conference on Parallel and Distributed Systems (ICPADS 2010), Shanghai, China, December (2010)Google Scholar
  4. 4.
    Fürlinger, K., Gerndt, M. ompP: A profiling tool for OpenMP. In: Proceedings of the First International Workshop on OpenMP (IWOMP 2005), Eugene, Oregon, USA, May (2005)Google Scholar
  5. 5.
    Geimer, M., Wolf, F., Wylie, B.J.N., Mohr, B.: Scalable parallel trace-based performance analysis. In: Proceedings of the 13th European PVM/MPI Users’ Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface (EuroPVM/MPI 2006), pp. 303–312. Bonn, Germany (2006)Google Scholar
  6. 6.
  7. 7.
  8. 8.
    Allen, D.M., Sameer, S.S.: Performance technology for complex parallel and distributed systems. pp. 37–46 (2000)Google Scholar
  9. 9.
    Mohr, B., Malony, A.D., Hoppe, H.-C., Schlimbach, F., Haab, G., Hoeflinger, J., Shah, S.: A performance monitoring interface for OpenMP. In: Proceedings of the Fourth Workshop on OpenMP (EWOMP 2002), Rome, Italy September (2002)Google Scholar
  10. 10.
    Mohr, B., Malony, A.D., Shende, S.S., Wolf, F.: Towards a performance tool interface for OpenMP: An approach based on directive rewriting. In: Proceedings of the Third Workshop on OpenMP (EWOMP’01), September (2001)Google Scholar
  11. 11.
    Nakhimovsky, G.: Debugging and performance tuning with library interposers, July 2001.
  12. 12.
  13. 13.
    Roth, P.C., Arnold, D.C., Miller, B.P. MRNet: A software-based multicast/reduction network for scalable tools. In: Proceedings of the 2003 Conference on Supercomputing (SC 2003), Phoenix, Arizona, USA, November (2003)Google Scholar
  14. 14.
    Shende, S.S., Malony, A.D.: The TAU parallel performance system. International Journal of High Performance Computing Applications, ACTS Collection Special Issue (2005)Google Scholar
  15. 15.
    Skinner, D.: Integrated Performance Monitoring: A portable profiling infrastructure for parallel applications. In: Proceedings ISC2005: International Supercomputing Conference, Heidelberg, Germany (2005)Google Scholar
  16. 16.
    Szebenyi, Z., Wylie, B.J.N., Wolf, F.: Scalasca parallel performance analyses of PEPC. In: Proceedings of the Workshop on Productivity and Performance (PROPER 2008) at EuroPar 2008, Las Palmas de Gran Canaria, Spain (2008)Google Scholar
  17. 17.
    Tallent, N.R., Mellor-Crummey, J., Adhianto, L., Fagan, M.W., Krentel, M.: Diagnosing performance bottlenecks in emerging petascale applications. In: SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pages 1–11, New York, NY, USA, ACM (2009)Google Scholar
  18. 18.
    Tallent, N.R., Mellor-Crummey, J.M.: Effective performance measurement and analysis of multithreaded applications. SIGPLAN Not. 44(4), 229–240 (2009)CrossRefGoogle Scholar
  19. 19.
    The Top 500 Supercomputer Sites, web page:
  20. 20.
    Wright, N.J., Pfeiffer, W., Snavely, A.: Characterizing parallel scaling of scientific applications using IPM. In: The 10th LCI International Conference on High-Performance Clustered Computing, March 10–12 (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Karl Fürlinger
    • 1
    Email author
  • Nicholas J. Wright
    • 2
  • David Skinner
    • 2
  • Christof Klausecker
    • 3
  • Dieter Kranzlmüller
    • 4
  1. 1.University of California at BerkeleyBerkeleyUSA
  2. 2.Lawrence Berkeley National LaboratoryNERSCBerkeleyUSA
  3. 3.Ludwig-Maximilians-Universität München (LMU)MunichGermany
  4. 4.Ludwig-Maximilians-Universität München (LMU) and Leibniz Supercomputing Centre (LRZ)MunichGermany

Personalised recommendations