Online Performance Analysis with the Vampir Tool Set
Abstract
Today, performance analysis of parallel applications is mandatory to fully exploit the capabilities of modern HPC systems. Many performance analysis tools are available to support users in this challenging task. All tools usually employ one of two analysis methodologies. The majority of analysis tools, such as HPCToolkit or Vampir, follow a post-mortem analysis approach. In this approach, a measurement infrastructure records performance data during the application execution and flushes its data to the file system. The tools perform subsequent analysis steps after the application execution by using the stored performance data. Post-mortem analysis comes with the disadvantage that possibly large data volumes need to be handled by the I/O subsystem of the machine. Tools following an online analysis approach mitigate this disadvantage by avoiding the I/O subsystem. The measurement infrastructure of these tools uses the network to directly transfer the recorded performance data to the analysis components of the tool. This approach, however, comes with the limitation that the complete analysis occurs at application runtime. In this work we present a prototype implementation of Vampir capable of performing online analysis. We discuss advantages and disadvantages of both approaches and draw conclusions for designing an online performance analysis tool.
References
- 1.Arm Forge (Arm MAP) Version 18.0 (2017). https://www.arm.com/products/development-tools/hpc-tools/cross-platform/forge
- 2.Brunst, H., Malony, A.D., Shende, S.S., Bell, R.: Online remote trace analysis of parallel applications on high-performance clusters. In: Veidenbaum, A., Joe, K., Amano, H., Aiso, H. (eds.) High Performance Computing: 5th International Symposium, ISHPC 2003, Tokyo-Odaiba, Japan, October 20–22, 2003. Proceedings 13, pp. 440–449. Springer, Berlin, Heidelberg (2003)CrossRefGoogle Scholar
- 3.Brunst, H., Weber, M.: Custom hot spot analysis of HPC software with the Vampir performance tool suite. In: Proceedings of the 6th International Parallel Tools Workshop, pp. 95–114. Springer, Berlin, Heidelberg, September 2012Google Scholar
- 4.Eschweiler, D., Wagner, M., Geimer, M., Knüpfer, A., Nagel, W., Wolf, F.: Open trace format 2: the next generation of scalable trace formats and support libraries. In: Proceedings of the 14th Biennial ParCo Conference, vol. 22 of ParCo2011, pp. 481–490, January 2012Google Scholar
- 5.Gerndt, M., Ott, M.: Automatic performance analysis with periscope. Concurr. Comput. Pract. Expe. 22(6), 736–748 (2010)Google Scholar
- 6.Grützun, V., Knoth, O., Simmel, M.: Simulation of the influence of aerosol particle characteristics on clouds and precipitation with LM-SPECS: model description and first results. Atmos. Res. 90(24), 233–242 (2008)CrossRefGoogle Scholar
- 7.Ilsche, T., Schuchart, J., Cope, J., Kimpe, D., Jones, T., Knüpfer, A., Iskra, K., Ross, R., Nagel, W.E., Poole, S.: Enabling event tracing at leadership-class scale through I/O forwarding middleware. In: Proceedings of the 21st International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2012, pp. 49–60. ACM, New York, NY, USA (2012)Google Scholar
- 8.Kitayama, I., Wylie, B.J.N., Maeda, T.: Execution performance analysis of the ABySS genome sequence assembler using Scalasca on the K computer. In: Parallel Computing: On the Road to Exascale, volume 27 of Advances in Parallel Computing, pp. 63–72. International Conference on Parallel Computing 2015, Edinburgh (Scotland), 1 Sep 2015–4 Sep 2015, IOS Press, September 2016Google Scholar
- 9.Knüpfer, A., Brendel, R., Brunst, H., Mix, H., Nagel, W.E.: Introducing the open trace format (OTF). In: Proceedings of the 6th International Conference on Computational Science - Volume Part II, ICCS 2006, pp. 526–533. Springer, Berlin, Heidelberg (2006)Google Scholar
- 10.Knüpfer, A., Brunst, H., Doleschal, J., Jurenz, M., Lieber, M., Mickler, H., Müller, M.S., Nagel, W.E.: The Vampir performance analysis tool-set. In: Resch, M., Keller, R., Himmler, V., Krammer, B., Schulz, A. (eds.), Tools for High Performance Computing, Proceedings of the 2nd International Workshop on Parallel Tools for High Performance Computing. Springer, Berlin, Heidelberg, July 2008Google Scholar
- 11.Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A.D., Nagel, W.E., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S., Tschüter, R., Wagner, M., Wesarg, B., Wolf, F.: Score-P: A Joint Performance Measurement Run-Time Infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Proceedings of 5th Parallel Tools Workshop, pp. 79–91. Springer, Berlin, Heidelberg (2012)Google Scholar
- 12.Lee, G.L., Ahn, D.H., Arnold, D.C., de Supinski, B.R., Legendre, M., Miller, B.P., Schulz, M., Liblit, B.: Lessons learned at 208K: towards debugging millions of cores. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008, pp. 26:1–26:9. IEEE Press, Piscataway, NJ, USA, (2008)Google Scholar
- 13.Miller, B.P., Callaghan, M.D., Cargille, J.M., Hollingsworth, J.K., Irvin, R.B., Karavanic, K.L., Kunchithapadam, K., Newhall, T.: The Paradyn parallel performance measurement tool. Computer 28(11), 37–46 (1995)CrossRefGoogle Scholar
- 14.Roth, P.C., Arnold, D.C., Miller, B.P.: MRNet: a software-based multicast/reduction network for scalable tools. In: Proceedings of the 2003 ACM/IEEE Conference on Supercomputing, SC 2003. ACM, New York, NY, USA (2003)Google Scholar
- 15.TOP500 List of the World’s Fastest Supercomputers (2017). http://www.top500.org
- 16.Wagner, M., Hilbrich, T., Brunst, H.: Online performance analysis: an event-based workflow design towards Exascale. In: 2014 IEEE International Conference on High Performance Computing and Communications, 2014 IEEE 6th International Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and Systems (HPCC, CSS, ICESS), pp. 839–846, August 2014Google Scholar
- 17.Weber, M., Geisler, R., Brunst, H., Nagel, W.E.: Folding methods for event timelines in performance analysis. In: Proceedings of the 29th IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pp. 205–214. IEEE Computer Society, May 2015Google Scholar
- 18.Wolf, F., Wylie, B.J.N., Ábrahám, E., Becker, D., Frings, W., Fürlinger, K., Geimer, M., Hermanns, M.-A., Mohr, B., Moore, S., Pfeifer, M., Szebenyi, Z.: Usage of the SCALASCA toolset for scalable performance analysis of large-scale parallel applications. In: Proceedings of the 2nd Parallel Tools Workshop, Stuttgart, Germany, pp. 157–167. Springer, July 2008Google Scholar
- 19.Wylie, B.J.N., Geimer, M., Mohr, B., Böhme, D., Szebenyi, Z., Wolf, F.: Large-scale performance analysis of Sweep3D with the Scalasca toolset. Parallel Proces. Lett. 20(4), 397–414 (2010)MathSciNetCrossRefGoogle Scholar