Skip to main content

A Survey About Quantitative Measurement of Performance Variability in High Performance Computers

  • Conference paper
  • First Online:
Advanced Parallel Processing Technologies (APPT 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10561))

Included in the following conference series:

Abstract

Due to less healthy, contention for shared resources, operating system interference and other factors in high performance computers, there are performance variability phenomena during various components runtime. With the scale of systems and numerical simulation program parallelism increases, the impact of performance variability will be magnified. This will introduce performance variability and degradations, affect applications scalability and overall system throughput. In this context, the performance variability becomes important question for both HPC systems and numerical simulation applications. The future research about this question will be helpful for the system and application design towards future exascale computing. In terms of this issue, this paper gives a literature review about quantitative measurement of performance variability in HPC systems. We summarize the quantitative measurement method of performance variability for three different components, including computation, memory and communication, respectively. Finally, we analyze the gap between researches and challenging demands, potential research issues and future work are also introduced.

This research is supported by the National Key R&D Plan of China (No. 2016YFBO201403), National Natural Science Foundation of China (61672003).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. The Top 500 Supercomputer List. http://www.top500.org

  2. Shalf, J., Dosanjh, S., Morrison, J.: Exascale computing technology challenges. In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR 2010. LNCS, vol. 6449, pp. 1–25. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19328-6_1

    Chapter  Google Scholar 

  3. Benoit, A., Cavelan, A., Robert, Y., Sun, H.: Assessing general-purpose algorithms to cope with fail-stop and silent errors. (Research Report) RR-8599, INRIA (2014)

    Google Scholar 

  4. Hardy, D., Sideris, I., Ladas, N., Sazeides, Y.: Modelling the performance vulnerability of arrays to permanent faults. In: The 9th Workshop on Silicon Errors in Logic System Effects (2013)

    Google Scholar 

  5. Allan, B.: Memory reliability and performance degradation: hunting rabbits with an elephant gun. In: Monitoring and Analysis for High Performance Computing Systems Plus Applications (HPCMASPA) Workshop. IEEE Cluster (2014)

    Google Scholar 

  6. Petrini, F., Kerbyson, D.K., Pakin, S.: The case of the missing supercomputer performance: achieving optimal performance on the 8,192 processors of ASCI Q. In: 2003 ACM/IEEE Conference IEEE Supercomputing (2003)

    Google Scholar 

  7. Ferreira, K.B., Bridges, P., Brightwell, R.: Characterizing application sensitivity to OS interference using kernel-level noise injection. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing, SC 2008, Piscataway, NJ, USA, pp. 1–12. IEEE Press (2008)

    Google Scholar 

  8. Wu, L., Wei, Y., Xu, X., Liu, X.: Impact of system noise by quantitative analysis. J. Comput. Res. Dev. 52(5), 1146–1152 (2015)

    Google Scholar 

  9. Mraz, R.: Reducing the variance of point to point transfers in the IBM 9076 parallel computer. In: Proceedings of the 1994 ACM/IEEE Conference on Supercomputing. IEEE Computer Society Press (1994)

    Google Scholar 

  10. Tabe, T.B., Hardwick, J.P. Stout, Q.F.: Statistical analysis of communication time on the IBM SP2. Comput. Sci. Stat. 347–351 (1996)

    Google Scholar 

  11. HPC-Colony Project. http://www.hpc-colony.org/

  12. International Workshop on Runtime and Operating System for Supercomputer. http://htor.inf.ethz.ch/ross2012/

  13. Johnson, G.: P-SNAP: a system benchmark for quantifying operating system interference or noise. http://www.c3.lanl.gov/pal/software/psnap/

  14. Hoefler, T., Schneider, T., Lumsdaine, A.: Characterizing the influence of system noise on large-scale applications by simulation. In: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society, pp. 1–11 (2010)

    Google Scholar 

  15. Sottile, M., Minnich, R.: Analysis of microbenchmarks for performance tuning of clusters. In: 2004 IEEE International Conference on IEEE Cluster Computing, pp. 371–377 (2004)

    Google Scholar 

  16. Beckman, P., Iskra, K., Yoshii, K., et al.: Benchmarking the effects of operating system interference on extreme-scale parallel machines. Cluster Comput. 11(1), 3–16 (2008)

    Article  Google Scholar 

  17. Hoefler, T., Mehlan, T., Lumsdaine, A., Rehm, W.: Netgauge: a network performance measurement framework. In: Perrott, R., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, Laurence T. (eds.) HPCC 2007. LNCS, vol. 4782, pp. 659–671. Springer, Heidelberg (2007). doi:10.1007/978-3-540-75444-2_62

    Chapter  Google Scholar 

  18. Wu, L., Wei, Y., Liu, X.: The quantitative measurement of system noise in multicore multiprocessor clustered systems. In: CCF HPC CHINA, Zhangjiajie, Hunan Province (2012)

    Google Scholar 

  19. Van Straalen, B., Shalf, J., Ligocki, T., et al.: Scalability challenges for massively parallel AMR applications. In: IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2009. IEEE, pp. 1–12 (2009)

    Google Scholar 

  20. Pusukuri, K.K., Gupta, R., Bhuyan, L.N.: Thread Tranquilizer: dynamically reducing performance variation. ACM Trans. Archit. Code Optim. (TACO) 8(4), 46 (2012)

    Google Scholar 

  21. Application performance variability on hopper. http://www.nersc.gov/users/computational-systems/hopper/performance-and-optimization/application-performance-variability-on-hopper/

  22. Bhatele, A., et al.: There goes the neighborhood: performance degradation due to nearby jobs. In: Proceedings of SC13: International Conference for High Performance Computing, Networking, Storage and Analysis. ACM (2013)

    Google Scholar 

  23. Jokanovic, A., et al.: Impact of inter-application contention in current and future HPC systems. In: 2010 IEEE 18th Annual Symposium on High Performance Interconnects (HOTI). IEEE (2010). Author, F.: Article title. Journal 2(5), 99–110 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linping Wu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Wu, L., Xu, X., Wei, Y., Liu, X. (2017). A Survey About Quantitative Measurement of Performance Variability in High Performance Computers. In: Dou, Y., Lin, H., Sun, G., Wu, J., Heras, D., Bougé, L. (eds) Advanced Parallel Processing Technologies. APPT 2017. Lecture Notes in Computer Science(), vol 10561. Springer, Cham. https://doi.org/10.1007/978-3-319-67952-5_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67952-5_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67951-8

  • Online ISBN: 978-3-319-67952-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics