Profiling of Task-Based Applications on Shared Memory Machines: Scalability and Bottlenecks

  • Ralf Hoffmann
  • Thomas Rauber
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4641)


A sophisticated approach for the parallel execution of irregular applications on parallel shared memory machines is the decomposition into fine-grained tasks. These tasks can be executed using a task pool which handles the scheduling of the tasks independently of the application. In this paper we present a transparent way to profile irregular applications using task pools without modifying the source code of the application. We show that it is possible to identify critical tasks which prevent scalability and to locate bottlenecks inside the application. We show that the profiling information can be used to determine a coarse estimation of the execution time for a given number of processors.


  1. 1.
    Hoffmann, R., Korch, M., Rauber, T.: Performance Evaluation of Task Pools Based on Hardware Synchronization. In: Proceedings of the 2004 Supercomputing Conference (SC 2004), Pittsburgh, PA (2004)Google Scholar
  2. 2.
    Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: Characterization and methodological considerations. In: Proceedings of the 22nd International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, pp. 24–36 (1995)Google Scholar
  3. 3.
    Hanrahan, P., Salzman, D., Aupperle, L.: A Rapid Hierarchical Radiosity Algorithm. In: Proceedings of SIGGRAPH (1991)Google Scholar
  4. 4.
    Brunst, H., Kranzlmüller, D., Nagel, W.E.: Tools for Scalable Parallel Program Analysis - Vampir VNG and DeWiz. In: Juhasz, Z., Kacsuk, P., Kranzlmüller, D. (eds.) DAPSYS. Kluwer International Series in Engineering and Computer Science, vol. 777, pp. 93–102. Springer, Heidelberg (2004)Google Scholar
  5. 5.
    Marin, G., Mellor-Crummey, J.: Cross-Architecture Performance Predictions for Scientific Applications Using Parameterized Models. In: Proceedings of Joint International Conference on Measurement and Modeling of Computer Systems - Sigmetrics 2004, New York, NY, pp. 2–13 (June 2004)Google Scholar
  6. 6.
    Kerbyson, D.J., Alme, H.J., Hoisie, A., Petrini, F., Wasserman, H.J., Gittings, M.: Predictive performance and scalability modeling of a large-scale application. In: Proceedings of the 2001 Supercomputing Conference (SC 2001), IEEE/ACM SIGARCH, p. 37 (2001)Google Scholar
  7. 7.
    Tapus, C., Chung, I.H., Hollingsworth, J.K.: Active Harmony: Towards Automated Performance Tuning. In: Supercomputing 2002. Proceedings of the 2002 ACM/IEEE conference on Supercomputing, Los Alamitos, CA, USA, pp. 1–11. IEEE Computer Society Press, Los Alamitos (2002)Google Scholar
  8. 8.
    Whaley, R.C., Dongarra, J.J.: Automatically Tuned Linear Algebra Software. Technical report, University of Tennessee (1999)Google Scholar
  9. 9.
    Faroughi, N.: Multi-Cache Profiling of Parallel Processing Programs Using Simics. In: Arabnia, H.R. (ed.) Proceedings of the PDPTA, pp. 499–505. CSREA Press (2006)Google Scholar
  10. 10.
    Malony, A., Shende, S.S., Morris, A.: Phase-Based Parallel Performance Profiling. In: Joubert, G.R., Nagel, W.E., Peters, F.J., Plata, O.G., Tirado, P., Zapata, E.L. (eds.) Proceedings of the PARCO. John von Neumann Institute for Computing Series, vol. 33, pp. 203–210. Central Institute for Applied Mathematics, Jülich, Germany (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Ralf Hoffmann
    • 1
  • Thomas Rauber
    • 1
  1. 1.Department for Mathematics, Physics and Computer Science, University of BayreuthGermany

Personalised recommendations