Skip to main content

Tracking User-Perceived I/O Slowdown via Probing

  • Conference paper
  • First Online:
High Performance Computing (ISC High Performance 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11887))

Included in the following conference series:

Abstract

The perceived I/O performance of a shared file system heavily depends on the usage pattern expressed by all concurrent jobs. From the perspective of a single user or job, the achieved I/O throughput can vary significantly due to activities conducted by other users or system services like RAID rebuilds. As these activities are hidden, users wonder about the cause of observed slowdown and may contact the service desk to report an unusual slow system.

In this paper, we present a methodology to investigate and quantify the user-perceived slowdown which sheds light on the perceivable file system performance. This is achieved by deploying a monitoring system on a client node that constantly probes the performance of various data and metadata operations and then compute a slowdown factor. This information could be acquired and visualized in a timely fashion, informing the users about the expected slowdown.

To evaluate the method, we deploy the monitoring on three data centers and explore the gathered data for up to a period of 60 days. A verification of the method is conducted by investigating the metrics while running the IO-500 benchmark. We conclude that this approach is able to reveal short-term and long-term interference.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    When the specified job walltime limit is hit, jobs are terminated.

  2. 2.

    The value could be updated periodically in a sliding window to cover the typical operational conditions or it could utilize other statistics than the median.

  3. 3.

    http://www.archer.ac.uk.

  4. 4.

    https://github.com/joobog/io-probing.

  5. 5.

    To minimize this, the precreated file size could have been increased.

  6. 6.

    https://github.com/hpc/ior.

References

  1. Bent, J., Kunkel, J., Lofstead, J., Markomanolis, G.: IO500 Full Ranked List, Supercomputing 2018 (Corrected), November 2018. https://www.vi4io.org/io500/list/19-01/start

  2. Carns, P.: Darshan. In: High Performance Parallel I/O. Computational Science Series, pp. 309–315. Chapman & Hall/CRC (2015)

    Google Scholar 

  3. Carns, P., et al.: Understanding and improving computational science storage access through continuous characterization. ACM Trans. Storage (TOS) 7(3), 8 (2011)

    Google Scholar 

  4. Kunkel, J.M., Markomanolis, G.S.: Understanding metadata latency with MDWorkbench. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds.) ISC High Performance 2018. LNCS, vol. 11203, pp. 75–88. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02465-9_5

    Chapter  Google Scholar 

  5. Lawrence, B., et al.: The JASMIN super-data-cluster. arXiv preprint arXiv:1204.3553 (2012)

  6. Lockwood, G.K., Snyder, S., Wang, T., Byna, S., Carns, P., Wright, N.J.: A year in the life of a parallel file system. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, p. 74. IEEE Press (2018)

    Google Scholar 

  7. Sivalingam, K., Richardson, H., Tate, A., Lafferty, M.: LASSi: metric based I/O analytics for HPC. In: SCS Spring Simulation Multi-Conference (SpringSim 2019), Tucson, AZ, USA (2019)

    Google Scholar 

  8. Tuncer, O., et al.: Diagnosing performance variations in HPC applications using machine learning. In: Kunkel, J.M., Yokota, R., Balaji, P., Keyes, D. (eds.) ISC 2017. LNCS, vol. 10266, pp. 355–373. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58667-0_19

    Chapter  Google Scholar 

  9. Uselton, A., Wright, N.: A file system utilization metric for I/O characterization (2013)

    Google Scholar 

  10. Voss, J., Garcia, J.A., Cyrus Proctor, W., Todd Evans, R.: Automated system health and performance benchmarking platform: high performance computing test harness with Jenkins. In: Proceedings of the HPC Systems Professionals Workshop, HPCSYSPROS 2017, pp. 1:1–1:8. ACM, New York (2017)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the UK National Supercomputing Service, ARCHER funded by EPSRC and NERC. We thank the German Climate Computing Center (DKRZ) for providing access to their machines to run the experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julian Kunkel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kunkel, J., Betke, E. (2019). Tracking User-Perceived I/O Slowdown via Probing. In: Weiland, M., Juckeland, G., Alam, S., Jagode, H. (eds) High Performance Computing. ISC High Performance 2019. Lecture Notes in Computer Science(), vol 11887. Springer, Cham. https://doi.org/10.1007/978-3-030-34356-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34356-9_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34355-2

  • Online ISBN: 978-3-030-34356-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics