The impact of system design parameters on application noise sensitivity

Ferreira, Kurt B.; Bridges, Patrick G.; Brightwell, Ron; Pedretti, Kevin T.

doi:10.1007/s10586-011-0178-3

The impact of system design parameters on application noise sensitivity

Published: 23 September 2011

Volume 16, pages 117–129, (2013)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Kurt B. Ferreira^1,2,
Patrick G. Bridges²,
Ron Brightwell¹ &
…
Kevin T. Pedretti¹

246 Accesses
14 Citations
Explore all metrics

Abstract

Operating system (OS) noise, or jitter, is a key limiter of application scalability in high end computing systems. Several studies have attempted to quantify the sources and effects of system interference, though few of these studies show the influence that architectural and system characteristics have on the impact of noise at scale. In this paper, we examine the impact of three such system properties: platform balance, noisy node distribution, and the choice of collective algorithm. Using a previously-developed noise injection tool, we explore how the impact of noise varies with these platform characteristics. We provide detailed performance results that indicate that a system with relatively less network bandwidth is able to absorb more noise than a system with more network bandwidth. Our results also show that application performance can be significantly degraded by only a subset of noisy nodes. Furthermore, the placement of the noisy nodes is also important, especially for applications that make substantial use of tree-based collective communication operations. Lastly, performance results indicate that non-blocking collective operations have the ability to greatly mitigate the impact of OS interference. When combined, these results show that the impact of OS noise is not solely a property of application communication behavior, but is also influenced by other properties of the system architecture and system software environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research into Changes in the Level of System Noise in High-Performance Clusters

Article 01 April 2022

D. A. Domracheva & K. S. Stefanov

Influence of Noisy Environments on Behavior of HPC Applications

Article 01 July 2021

D. A. Nikitenko, F. Wolf, … A. Calotoiu

Performance Variability on Xeon Phi

References

Alam, S.R., Vetter, J.S.: An analysis of system balance requirements for scientific applications. In: ICPP ’06: Proceedings of the 2006 International Conference on Parallel Processing, pp. 229–236. IEEE Computer Society, Washington (2006)
Chapter Google Scholar
Almási, G., Heidelberger, P., Archer, C.J., Martorell, X., Erway, C.C., Moreira, J.E., Steinmacher-Burow, B., Zheng, Y.: Optimization of MPI collective communication on BlueGene/L systems. In: ICS ’05: Proceedings of the 19th annual international conference on Supercomputing, New York, NY, USA, pp. 253–262. ACM Press, New York (2005)
Chapter Google Scholar
Beckman, P., Iskra, K., Yoshii, K., Coghlan, S.: The influence of operating systems on the performance of collective operations at extreme scale. In: IEEE Conference on Cluster Computing, September (2006)
Google Scholar
Brightwell, R., Hudson, T., Pedretti, K.T., Underwood, K.D.: SeaStar Interconnect: balanced bandwidth for scalable performance. IEEE MICRO 26(3), 41–57 (2006)
Article Google Scholar
Durstenfeld, R.: Algorithm 235: random permutation. Commun. ACM 7(7), 420 (1964)
Article Google Scholar
Ferreira, K.B., Brightwell, R., Bridges, P.G.: Characterizing application sensitivity to OS interference using kernel-level noise injection. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing (Supercomputing’08) November (2008)
Google Scholar
Hertel, J.E.S., Bell, R., Elrick, M., Farnsworth, A., Kerley, G., McGlaun, J., Petney, S., Silling, S., Taylor, P., Yarrington, L.: CTH: a software family for multi-dimensional shock physics analysis. In: Proceedings of the 19th International Symposium on Shock Waves, held at Marseille, France, July, pp. 377–382 (1993)
Google Scholar
Hoefler, T., Lumsdaine, A., Rehm, W.: Implementation and performance analysis of non-blocking collective operations for MPI. In: Proceedings of the 2007 International Conference on High Performance Computing, Networking, Storage and Analysis, SC07, Nov. IEEE Computer Society/ACM, New York (2007)
Google Scholar
Hoefler, T., Schneider, T., Lumsdaine, A.: Characterizing the influence of system noise on large-scale applications by simulation. In: International Conference for High Performance Computing, Networking, Storage and Analysis (SC’10), Nov. (2010)
Google Scholar
Hoefler, T., Schneider, T., Lumsdaine, A.: Loggopsim—simulating large-scale applications in the LogGOPS model, Jun. (2010), Accepted at the ACM Workshop on Large-Scale System and Application Performance (LSAP 2010)
Google Scholar
Jones, T., Tuel, W., Brenner, L., Fier, J., Caffrey, P., Dawson, S., Neely, R., Blackmore, R., Maskell, B., Tomlinson, P., Roberts, M.: Improving the scalability of parallel jobs by adding parallel awareness to the operating system. In: Proceedings of SC’03 (2003)
Google Scholar
Katramatos, D., Chapin, S.J., Hillman, P., Fisk, L.A., van Dresser, D.: Cross-operating system process migration on a massively parallel processor. Technical Report CS-98-28, University of Virginia (1998)
Kerbyson, D.J., Jones, P.W.: A performance model of the Parallel Ocean Program. Int. J. High Perform. Comput. Appl. 19(3), 261–276 (2005)
Article Google Scholar
Kerbyson, D.J., Alme, H.J., Hoisie, A., Petrini, F., Wasserman, H.J., Gittings, M.: Predictive performance and scalability modeling of a large-scale application. In: Proceedings of the 2001 ACM/IEEE Conference on Supercomputing, Denver, CO, pp. 37–48. ACM Press, New York (2001)
Chapter Google Scholar
Mann, P.D.V., Mittaly, U.: Handling OS jitter on multicore multithreaded systems. In: IPDPS ’09: Proceedings of the 2009 IEEE International Symposium on Parallel & Distributed Processing, pp. 1–12. IEEE Computer Society, Washington (2009)
Chapter Google Scholar
Moreira, J., Brutman, M., Castanos, J., Gooding, T., Inglett, T., Lieber, D., McCarthy, P., Mundy, M., Parker, J., Wallenfelt, B., Giampapa, M., Engelsiepen, T., Haskin, R.: Designing a highly-scalable operating system: The Blue Gene/L story. In: Proceedings of the 2006 ACM/IEEE International Conference for High-Performance Computing, Networking, Storage, and Analysis (SC’06), Tampa, Florida, November (2006)
Google Scholar
Nataraj, A., Morris, A., Malony, A.D., Sottile, M., Beckman, P.: The ghost in the machine: observing the effects of kernel operation on parallel application performance. In: Proceedings of SC’07 (2007)
Google Scholar
Pedretti, K.T., Vaughan, C., Hemmert, K.S., Barrett, B.: Application sensitivity to link and injection bandwidth on a Cray XT4 system. In: Proceedings of the 2008 Cray User Group Annual Technical Conference, May (2008)
Google Scholar
Petrini, F., Kerbyson, D., Pakin, S.: The case of the missing supercomputer performance: Achieving optimal performance on the 8,192 processors of ASCI Q. In: Proceedings of the International Conference on High-Performance Computing and Networking, Phoenix, AZ (2003)
Google Scholar
Pjesivac-Grbovic, J., Angskun, T., Bosilca, G., Fagg, G.E., Gabriel, E., Dongarra, J.: Performance analysis of MPI collective operations. Clust. Comput. 10(2), 127–143 (2007)
Article Google Scholar
Straalen, B.V., Shalf, J., Ligocki, T., Keen, N., Yan, W.-S.: Scalability challenges for massively parallel AMR applications. In: Proceedings of the International Parallel and Distributed Processing Symposium, May (2009)
Google Scholar
Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19, 49–66 (2005)
Article Google Scholar
Zajcew, R., Roy, P., Black, D., Peak, C., Guedes, P., Kemp, B., LoVerso, J., Leibensperger, M., Barnett, M., Rabii, F., Netterwala, D.: An OSF/1 UNIX for Massively Parallel Multicomputers. In: Proceedings of the 1993 Winter USENIX Technical Conference, January, pp. 449–468 (1993)
Google Scholar
Zhu, H., Goodell, D., Gropp W.i., Thakur R.: Hierarchical collectives in MPICH2. In: Proceedings of the 16th European PVM/MPI Users’ Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp. 325–326. Springer Berlin, Heidelberg (2009)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Scalable System Software Department, Sandia National Laboratories, Albuquerque, NM, 87185-1319, USA
Kurt B. Ferreira, Ron Brightwell & Kevin T. Pedretti
Computer Science Department, The University of New Mexico, Albuquerque, NM, 87131, USA
Kurt B. Ferreira & Patrick G. Bridges

Authors

Kurt B. Ferreira
View author publications
You can also search for this author in PubMed Google Scholar
Patrick G. Bridges
View author publications
You can also search for this author in PubMed Google Scholar
Ron Brightwell
View author publications
You can also search for this author in PubMed Google Scholar
Kevin T. Pedretti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kurt B. Ferreira.

Additional information

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ferreira, K.B., Bridges, P.G., Brightwell, R. et al. The impact of system design parameters on application noise sensitivity. Cluster Comput 16, 117–129 (2013). https://doi.org/10.1007/s10586-011-0178-3

Download citation

Received: 27 July 2011
Accepted: 11 August 2011
Published: 23 September 2011
Issue Date: March 2013
DOI: https://doi.org/10.1007/s10586-011-0178-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The impact of system design parameters on application noise sensitivity

Abstract

Access this article

Similar content being viewed by others

Research into Changes in the Level of System Noise in High-Performance Clusters

Influence of Noisy Environments on Behavior of HPC Applications

Performance Variability on Xeon Phi

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The impact of system design parameters on application noise sensitivity

Abstract

Access this article

Similar content being viewed by others

Research into Changes in the Level of System Noise in High-Performance Clusters

Influence of Noisy Environments on Behavior of HPC Applications

Performance Variability on Xeon Phi

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation