Abstract
The starting point for our work was a demand for an overview of application’s I/O behavior, that provides information about the usage of our HPC “Mistral”. We suspect that some applications are running using inefficient I/O patterns, and probably, are wasting a significant amount of machine hours. To tackle the problem, we focus on detection of poor I/O performance, identification of these applications, and description of I/O behavior.
Instead of gathering I/O statistics from global system variables, like many other monitoring tools do, in our approach statistics come directly from I/O interfaces POSIX, MPI, HDF5 and NetCDF. For interception of I/O calls we use an instrumentation library that is dynamically linked with LD_PRELOAD at program startup.
The HPC on-line monitoring framework is built on top of open source software: Grafana, SIOX, Elasticsearch and FUSE. This framework collects I/O statistics from applications and mount points. The latter is used for non-intrusive monitoring of virtual memory allocated with mmap(), i.e., no code adaption is necessary. The framework is evaluated showing its effectiveness and critically discussed.
References
Darshan HPC I/O Characterization Tool (2015). http://www.mcs.anl.gov/research/projects/darshan/
SCORE-P (2015). http://www.vi-hps.org/projects/score-p/
Vampir (2015). http://www.paratools.com/Vampir
Mistral (2016). https://www.dkrz.de/Nutzerportal-en/doku/mistral
Beautiful metric & analytic dashboards (2017). http://grafana.org/
Carns, P.: Darshan. In: High Performance Parallel I/O. Computational Science Series, pp. 309–315. Chapman & Hall/CRC (2015)
Gormley, C., Tong, Z.: Elasticsearch: The Definitive Guide, 1st edn. O’Reilly Media, Inc., Sebastopol (2015)
Kahanwal, B.: File System Design Approaches. CoRR abs/1403.5976 (2014). http://arxiv.org/abs/1403.5976
Knüpfer, A., Rössel, C., an Mey, D., Biersdorff, S., Diethelm, K., Eschweiler, D., Geimer, M., Gerndt, M., Lorenz, D., Malony, A., Nagel, W.E., Oleynik, Y., Philippen, P., Saviankou, P., Schmidl, D., Shende, S., Tschüter, R., Wagner, M., Wesarg, B., Wolf, F.: Score-P: a joint performance measurement run-time infrastructure for Periscope, Scalasca, TAU, and Vampir. In: Brunst, H., Müller, M., Nagel, W., Resch, M. (eds.) Tools for High Performance Computing, pp. 79–91. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31476-6_7
Kunkel, J., Zimmer, M., Hübbe, N., Aguilera, A., Mickler, H., Xuan Wang, A.C., Thomas Bönisch, J.L., Michel, R., Weging, J.: The SIOX architecture – coupling automatic monitoring and optimization of parallel I/O (2014)
Thakur, R., Gropp, W., Lusk, E.: On implementing MPI-IO portably and with high performance. In: Proceedings of the Sixth Workshop on I/O in Parallel and Distributed Systems, IOPADS 1999, pp. 23–32. ACM, New York (1999). http://doi.acm.org/10.1145/301816.301826
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Betke, E., Kunkel, J. (2017). Real-Time I/O-Monitoring of HPC Applications with SIOX, Elasticsearch, Grafana and FUSE. In: Kunkel, J., Yokota, R., Taufer, M., Shalf, J. (eds) High Performance Computing. ISC High Performance 2017. Lecture Notes in Computer Science(), vol 10524. Springer, Cham. https://doi.org/10.1007/978-3-319-67630-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-67630-2_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67629-6
Online ISBN: 978-3-319-67630-2
eBook Packages: Computer ScienceComputer Science (R0)