Detecting Application Load Imbalance on High End Massively Parallel Systems

  • Luiz DeRose
  • Bill Homer
  • Dean Johnson
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4641)


Scientific applications should be well balanced in order to achieve high scalability on current and future high end massively parallel systems. However, the identification of sources of load imbalance in such applications is not a trivial exercise, and the current state of the art in performance analysis tools do not provide an efficient mechanism to help users to identify the main areas of load imbalance in an application. In this paper we discuss a new set of metrics that we defined to identify and measure application load imbalance. We then describe the extensions that were made to the Cray performance measurement and analysis infrastructure to detect application load imbalance and present to the user in an insightful way.


Load Balance Processing Element Parallel System Load Imbalance Call Graph 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Top500 Supercomputer Sites: The 28th TOP500 List (2006),
  2. 2.
    Graham, S., Kessler, P., McKusick, M.: gprof: A Call Graph Execution Profiler. In: Proceedings of the SIGPLAN 1982 Symposium on Compiler Construction, Boston, MA, pp. 120–126. Association for Computing Machinery (June 1982)Google Scholar
  3. 3.
    Pettersson, M.: Linux X86 Performance-Monitoring Counters Driver. Computing Science Department, Uppsala University - Sweden (2002),
  4. 4.
    Browne, S., Dongarra, J., Garner, N., Ho, G., Mucci, P.: A Portable Programming Interface for Performance Evaluation on Modern Processors. The International Journal of High Performance Computing Applications 14(3), 189–204 (2000)CrossRefGoogle Scholar
  5. 5.
    DeRose, L., Reed, D.: Svpablo: A Multi-Language Architecture-Independent Performance Analysis System. In: Proceedings of the International Conference on Parallel Processing, pp. 311–318 (August 1999)Google Scholar
  6. 6.
    DeRose, L.: The Hardware Performance Monitor Toolkit. In: Sakellariou, R., Keane, J.A., Gurd, J.R., Freeman, L. (eds.) Euro-Par 2001. LNCS, vol. 2150, pp. 122–131. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  7. 7.
    Mellor-Crummey, J., Fowler, R., Marin, G., Tallent, N.: HPCView: A tool for top-down analysis of node performance. The Journal of Supercomputing 23, 81–101 (2002)zbMATHCrossRefGoogle Scholar
  8. 8.
    Nagel, W., Arnold, A., Weber, M., Hoppe, H.C., Solchenbach, K.: Vampir: Visualization and Analysis of MPI Resources. Supercomputer 12, 69–80 (1996)Google Scholar
  9. 9.
    Kim, S., Kuhn, B., Voss, M., Hoppe, H.C., Nagel, W.: VGV: Supporting Performance Analysis of Object-Oriented Mixed MPI/OpenMP Parallel Applications. In: Proceedings of the International Parallel and Distributed Processing Symposium (April 2002)Google Scholar
  10. 10.
    European Center for Parallelism of Barcelona (CEPBA): Paraver - Parallel Program Visualization and Analysis Tool - Reference Manual (November 2000),
  11. 11.
    Wu, C., Bolmarcich, A., Snir, M., Wootton, D., Parpia, F., Chan, A., Lusk, E., Gropp, W.: From trace generation to visualization: A performance framework for distributed parallel systems. In: Proceedings of Supercomputing 2000 (November 2000)Google Scholar
  12. 12.
    DeRose, L., Ekanadham, K., Hollingsworth, J.K., Sbaraglia, S.: SIGMA: A Simulator Infrastructure to Guide Memory Analysis. In: Proceedings of SC 2002, Baltimore, Maryland (November 2002)Google Scholar
  13. 13.
    Labarta, J., Girona, S., Cortes, T.: Analyzing scheduling policies using Dimemas. Parallel Computing 23(1–2), 23–34 (1997)zbMATHCrossRefGoogle Scholar
  14. 14.
    Snavely, A., Carrington, L., Wolter, N., Labarta, J., Badia, R., Purkayastha, A.: A framework for performance modeling and prediction. In: Proceedings of SC 2002, Baltimore, Maryland (November 2002)Google Scholar
  15. 15.
    Bell, R., Malony, A.D., Shende, S.: A Portable, Extensible, and Scalable Tool for Parallel Performance Profile Analysis. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 17–26. Springer, Heidelberg (2003)Google Scholar
  16. 16.
    Wolf, F., Mohr, B.: Automatic performance analysis of hybrid mpi/openmp applications. Journal of Systems Architecture, Special Issue ’Evolutions in parallel distributed and network-based processing’ 49(10–11), 421–439 (2003)Google Scholar
  17. 17.
    Miller, B.P., Callaghan, M.D., Cargille, J.M., Hollingsworth, J.K., Irvin, R.B., Karavanic, K.L., Kunchithapadam, K., Newhall, T.: The Paradyn Parallel Performance Measurement Tools. IEEE Computer 28(11), 37–46 (1995)Google Scholar
  18. 18.
    DeRose, L., Homer, B., Johnson, D., Kaufmann, S.: The New Generation of Cray Tools. In: Proceedings of Cray Users Group Meeting – CUG 2005 (May 2005)Google Scholar
  19. 19.
    Lawrence Livermode National Laboratory: the ASCI sweep3d Benchmark Code (1995),
  20. 20.
    DeRose, L., Pantano, M., Aydt, R., Shaffer, E., Schaeffer, B., Whitmore, S., Reed, D.A.: An Approach to Immersive Performance Visualization of Parallel and Wide-Area Distributed Applications. In: Proceedings of 8th International Symposium on High Performance Distributed Computing - HPDC 1999 (August 1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Luiz DeRose
    • 1
  • Bill Homer
    • 1
  • Dean Johnson
    • 1
  1. 1.Cray Inc., Mendota Heights, MNUSA

Personalised recommendations