The Malthusian Catastrophe Is Upon Us! Are the Largest HPC Machines Ever Up?

Kovatch, Patricia; Ezell, Matthew; Braby, Ryan

doi:10.1007/978-3-642-29740-3_25

Patricia Kovatch³⁰,
Matthew Ezell³⁰ &
Ryan Braby³⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7156))

Included in the following conference series:

European Conference on Parallel Processing

1248 Accesses
6 Citations

Abstract

Thomas Malthus, an English political economist who lived from 1766 to 1834, predicted that the earth’s population would be limited by starvation since population growth increases geometrically and the food supply only grows linearly. He said, “the power of population is indefinitely greater than the power in the earth to provide subsistence for man,” thus defining the Malthusian Catastrophe. There is a parallel between this prediction and the conventional wisdom regarding super-large machines: application problem size and machine complexity is growing geometrically, yet mitigation techniques are only improving linearly.

To examine whether the largest machines are usable, the authors collected and examined component failure rates and Mean Time Between System Failure data from the world’s largest production machines, including Oak Ridge National Laboratory’s Jaguar and the University of Tennessee’s Kraken. The authors also collected MTBF data for a variety of Cray XT series machines from around the world, representing over 6 Petaflops of compute power. An analysis of the data is provided as well as plans for future work. High performance computing’s Malthusian Catastrophe hasn’t happened yet, and advances in system resiliency should keep this problem at bay for many years to come.

Download to read the full chapter text

Chapter PDF

High Performance Computing: Challenges and Risks for the Future

Measuring the Resiliency of Extreme-Scale Computing Environments

News on Energy-Efficient Large-Scale Computing

Keywords

References

TeraGrid, http://www.teragrid.org/
Piazzalunga, D.: Project Triangle. Figure in public domain, downloaded from, http://en.wikipedia.org/wiki/File:Project_Triangle.svg
Stearley, J.: Defining and Measuring Supercomputer Reliability, Availability, and Serviceability (RAS). In: 6th LCI Conference on Linux Clusters (April 2005)
Google Scholar
Top500 Supercomputer Sites, http://top500.org/
The Computer Failure Data Repository, http://cfdr.usenix.org/
Gottumukkala, N., Nassar, R., Paun, M., Leangsuksun, C., Scott, S.: Reliability of a System of k Nodes for High Performance Computing Applications. IEEE Transactions on Reliability 59(1), 162–169 (2010)
Article Google Scholar
Johnson, S.: Cray Inc. Personal Communication
Google Scholar
Andrews, P., Kovatch, P., Hazlewood, V., Baer, T.: Scheduling a 100,000 core Supercomputer for Maximum Utilization and Capability. In: 39th International Conference on Parallel Processing Workshops (2010)
Google Scholar
Becklehimer, J., Willis, C., Lothian, J., Maxwell, D., Vasil, D.: Real Time Health Monitoring of the Cray XT3/XT4 Using the Simple Event Correlator (SEC). Cray Users Group (2007)
Google Scholar
Schroeder, B., Gibson, G.: A Large-Scale Study of Failures in High-Performance Computing Systems
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute for Computational Sciences, The University of Tennessee, Knoxville, USA
Patricia Kovatch, Matthew Ezell & Ryan Braby

Authors

Patricia Kovatch
View author publications
You can also search for this author in PubMed Google Scholar
Matthew Ezell
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Braby
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Scilytics, Koellnerhofgasse 3/15A, 1010, Vienna, Austria
Michael Alexander
ICAR-CNR, Via P. Castellino, 111, 80131, Napoli, Italy
Pasqua D’Ambra
University of Amsterdam, 1090, Amsterdam, Netherlands
Adam Belloum
Innovative Computing Laboratory, The University of Tennessee, US
George Bosilca
Department of Experimental Medicine and Clinic, University Magna Græcia, 88100, Catanzaro, Italy
Mario Cannataro
Computer Science Department, University of Pisa, Italy
Marco Danelutto
Second University of Naples, Italy
Beniamino Di Martino
TUMünchen,, Boltzmannstr. 3, ,, 85748, Garching, Germany
Michael Gerndt
Equipe Runtime, INRIA Bordeaux Sud-Ouest, 33405, Talence Cedex, France
Emmanuel Jeannot & Raymond Namyst &
Equipe HIEPACS, INRIA Bordeaux Sud-Ouest, 33405, Talence Cedex, France
Jean Roman
Computer Science and Mathematics Division, Oak Ridge National Laboratory, 37831-6164, Oak Ridge, TN, USA
Stephen L. Scott
Department of Scientific Computing, University of Vienna, Nordbergstr. 15/3C, 1090, Vienna, Austria
Jesper Larsson Traff
Computer Science and Mathematics Division, Oak Ridge National Laboratory, 37831, Oak Ridge, TN, USA
Geoffroy Vallée
Technische Universität München, Germany
Josef Weidendorfer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kovatch, P., Ezell, M., Braby, R. (2012). The Malthusian Catastrophe Is Upon Us! Are the Largest HPC Machines Ever Up?. In: Alexander, M., et al. Euro-Par 2011: Parallel Processing Workshops. Euro-Par 2011. Lecture Notes in Computer Science, vol 7156. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29740-3_25

Download citation

DOI: https://doi.org/10.1007/978-3-642-29740-3_25
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29739-7
Online ISBN: 978-3-642-29740-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Malthusian Catastrophe Is Upon Us! Are the Largest HPC Machines Ever Up?

Abstract

Chapter PDF

Similar content being viewed by others

High Performance Computing: Challenges and Risks for the Future

Measuring the Resiliency of Extreme-Scale Computing Environments

News on Energy-Efficient Large-Scale Computing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

The Malthusian Catastrophe Is Upon Us! Are the Largest HPC Machines Ever Up?

Abstract

Chapter PDF

Similar content being viewed by others

High Performance Computing: Challenges and Risks for the Future

Measuring the Resiliency of Extreme-Scale Computing Environments

News on Energy-Efficient Large-Scale Computing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation