Cluster Computing

, Volume 11, Issue 1, pp 57–73 | Cite as

Integrated parallel performance views

  • Aroon Nataraj
  • Allen D. Malony
  • Sameer Shende
  • Alan Morris


The influences of the operating system and system-specific effects on application performance are increasingly important considerations in high performance computing. OS kernel measurement is key to understanding the performance influences and the interrelationship of system and user-level performance factors. The KTAU (Kernel TAU) methodology and Linux-based framework provides parallel kernel performance measurement from both a kernel-wide and process-centric perspective. The first characterizes overall aggregate kernel performance for the entire system. The second characterizes kernel performance when it runs in the context of a particular process. KTAU extends the TAU performance system with kernel-level monitoring, while leveraging TAU’s measurement and analysis capabilities. We explain the rational and motivations behind our approach, describe the KTAU design and implementation, and show working examples on multiple platforms demonstrating the versatility of KTAU in integrated system/application monitoring.


Parallel performance Kernel Linux Instrumentation Measurement 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Petrini, F., Kerbyson, D.J., Pakin, S.: The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of asci q. In: SC ’03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, p. 55. IEEE Computer Society, Washington (2003) Google Scholar
  2. 2.
    Jones, T., et al.: Improving the scalability of parallel jobs by adding parallel awareness to the operating system. In: SC ’03: Proceedings of the 2003 ACM/IEEE conference on Supercomputing. IEEE Computer Society, Washington (2003) Google Scholar
  3. 3.
    TAU: Tuning and Analysis Utilities,
  4. 4.
    Hollingsworth, J.K., Miller, B.P., Cargille, J.: Dynamic program instrumentation for scalable performance tools. Tech. Rep. CS-TR-1994-1207 (1994) [Online]. Available:
  5. 5.
    Tamches, A., Miller, B.P.: Fine-grained dynamic instrumentation of commodity operating system kernels. Oper. Syst. Des. Implement, 117–130 (1999) Google Scholar
  6. 6.
    Cantrill, B.M., Shapiro, M.W., Leventhal, A.H.: Dynamic instrumentation of production systems. In: USENIX ’04: Proceedings of the 2004 USENIX Annual Technical Conference, p. 13. USENIX, Boston (2004) Google Scholar
  7. 7.
    Yaghmour, K., Dagenais, M.R.: Measuring and characterizing system behavior using kernel-level event logging. In: USENIX ’00: Proceedings of the 2000 USENIX Annual Technical Conference, p. 15. USENIX, Boston (2000) Google Scholar
  8. 8.
    Wisniewski, R.W., Rosenburg, B.: Efficient, unified, and scalable performance monitoring for multiprocessor operating systems. [Online]. Available:
  9. 9.
    Richard, M.D., et al.: Efficient and accurate tracing of events in linux clusters. [Online]. Available:
  10. 10.
  11. 11.
  12. 12.
    Ruan, Y., Pai, V.: Making the “box” transparent: System call performance as a first-class result. In: USENIX ’04: Proceedings of the 2004 USENIX Annual Technical Conference, p. 15. USENIX, Boston (2004) Google Scholar
  13. 13.
    Mirgorodskiy, A., Miller, B.P.: Crosswalk: A tool for performance profiling across the user-kernel boundary. [Online]. Available:
  14. 14.
    Etsion, Y., Tsafrir, D., Kirkpatrick, S., Feitelson, D.G.: Fine grained kernel logging with klogger: Experience and insights, Technical Report 2005-35. School of Computer Science and Engineering, The Hebrew University of Jerusalem (2005) Google Scholar
  15. 15.
    Sharma, S., Bridges, P.G., Maccabe, A.B.: A framework for analyzing linux system overheads on hpc applications. In: LACSI ’05: Proceedings of the 2005 Los Alamos Computer Science Institute Symposium, Santa Fe, NM, USA, p. 17 (2005) Google Scholar
  16. 16.
    Bell, R., Malony, A.D., Shende, S.: A portable, extensible, and scalable tool for parallel performance profile analysis. In: Lecture Notes in Computer Science, vol. 2790, pp. 17–26. Springer, Berlin (2003) Google Scholar
  17. 17.
    Nagel, W.E., Arnold, A., Weber, M., Hoppe, H.C., Solchenbach, K.: VAMPIR: Visualization and analysis of MPI resources. Supercomputer 12(1), 69–80 (1996). [Online]. Available: Google Scholar
  18. 18.
    Zaki, O., Lusk, E., Gropp, W., Swider, D.: Toward scalable performance visualization with Jumpshot. Int. J. High Perform. Comput. Appl. 13(3), 277–288 (1999). [Online]. Available: CrossRefGoogle Scholar
  19. 19.
    ZeptoOS: The small linux for big computers,
  20. 20.
    Bailey, D.H., Barszcz, E., Barton, J.T., Browning, D.S., Carter, R.L., Dagum, D., Fatoohi, R.A., Frederickson, P.O., Lasinski, T.A., Schreiber, R.S., Simon, H.D., Venkatakrishnan, V., Weeratunga, S.K.: The nas parallel benchmarks. Int. J. Supercomput. Appl. 5(3), 63–73 (1991). [Online]. Available: CrossRefGoogle Scholar
  21. 21.
    Hoisie, A., Lubeck, O.M., Wasserman, H.J., Petrini, F., Alme, H.: A general predictive performance model for wavefront algorithms on clusters of SMPs. In: International Conference on Parallel Processing, p. 219 (2000) Google Scholar
  22. 22.
    McVoy, L.W., Staelin, C.: lmbench: Portable tools for performance analysis. In: USENIX Annual Technical Conference, pp. 279–294 (1996). [Online]. Available:
  23. 23.
    Nataraj, A., Malony, A., Morris, A., Shende, S.: Early experiences with ktau on the ibm bg/l. In: EuroPar06 European Conference on Parallel Processing (2006) Google Scholar
  24. 24.
    Bhattacharya, S., Apte, V.: A measurement study of the linux tcp/ip stack performance and scalability on smp systems. In: 1st International Conference on COMmunication Systems softWAre and middlewaRE (COMSWARE) (2006) Google Scholar
  25. 25.
    Personal communication—Application Specific Linux,

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  • Aroon Nataraj
    • 1
  • Allen D. Malony
    • 1
  • Sameer Shende
    • 1
  • Alan Morris
    • 1
  1. 1.Department of Computer and Information ScienceUniversity of OregonEugeneUSA

Personalised recommendations