Abstract
With teraflops-scale computational modeling expected to be routine by 2003-04, under the roadmap of the Accelerated Strategic Computing Initiative (ASCI) of the U.S. Department of Energy, and with teraflops-capable platforms already available to a small group of users, attention naturally focuses on the next symbolically important milestone, computing at rates of 1015 floating point operations per second, or “petaflop/s.” For architectural designs that are in any sense extrapolations of today’s, petaflops-scale computing will require approximately one-million-fold instruction-level concurrency. Given that cost-effective one-thousand-fold concurrency is challenging in practical computational fluid dynamics simulations today, algorithms are among the many possible bottlenecks to CFD on petaflops systems. After a general outline of the problems and prospects of petaflops computing, we examine the issue of algorithms for PDE computations in particular. A back-of-the-envelope parallel complexity analysis focuses on the latency of global synchronization steps in the implicit algorithm. We argue that the latency of synchronization steps is a fundamental, but addressable, challenge for PDE computations with static data structures, which are primarily determined by grids. We provide recent results with encouraging scalability for parallel implicit Euler simulations using the Newton-Krylov-Schwarz solver in the PETSc software library. The prospects for PDE simulations with dynamically evolving data structures are far less clear.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
W.K. Anderson (1997), FUN2D/3D (homepage). http://fmad-www.larc.nasa.gov/wanderso/Fun/fun.html.
W. K. Anderson, W.D.Gropp, D.K. Kaushik, D.E. Keyes, and B.F. Smith (1999), Achieving High Sustained Performance in an Unstructured Mesh CFD Application, in Proceedings of Supercomputing’99, IEEE Computer Society, New York. http://www.icase.edu/~keyes/papers/finalbell.ps/~keyes/papers/finalbell.ps
D.H. Bailey et al. (1997), The 1997 Petaflops Algorithms Workshop Summary Report. http://www.ccic.gov/cicrd/pca-wg/pa197.html/cicrd/pca-wg/pa197.html
S. Balay, W.D. Gropp, L.C. Mcinnes, and B.F. Smith (1997), The Portable, Extensible Toolkit for Scientific Computing, version 2.0.17 (code and documentation).http://www.mcs.anl.gov/petsc/petsc
S. Balay, W.D. Gropp, L.C. Mcinnes, and B.F. Smith (1997), Efficient Management of Parallelism in Object-Oriented Numerical Software Libraries, Modern Software Tools in Scientific Computing, E. Arge, A.M. Bruaset, and H.P. Langtangen, eds., Birkhauser Press, pp. 163–201. ftp://info.mcs.anl.gov/pub/petsc/scitools96.ps.gz/pub/petsc/scitools96.ps.gz
S.T. Barnard and Robert L. Clay (1997), A portable MPI Implementation of the SPAI Preconditioner in ISSIS++, Technical Report NAS-97–002, NASA Ames Research Center. ftp://science.nas.nasa.gov/Pubs/TechReports/NASreprots/NAS-97-002/NAS-97-002.html./Pubs/TechReports/NASreprots/NAS-97-002/NAS-97-002.html.
X.-C. Cai (1989), Some Domain Decomposition Algorithms for Nonselfadjoint Elliptic and Parabolic Partial Differential Equations, Technical Report 461, Courant Institute, NYU.
X.-C. Cai, D.E. Keyes, and V. Venkatakrishnan (1995), Newton-KrylovSchwarz: An Implicit Solver for CFD, in Proceedings of the Eighth International Conference on Domain Decomposition Methods (R. Glowinski et al., eds.), Wiley, New York, pp. 387–400; also ICASE TR 95–87. ftp://ftp.icase.edu/pub/techreports/95/95-87.ps/pub/techreports/95/95-87.ps
X.-C. Cai and M. Sarkis (1997), A Restricted Additive Schwarz Preconditioner for Nonsymmetric Linear Systems, Technical Report CU-CS 843–97, computer Sci. Dept., Univ. of Colorado, Boulder. http://www.cs.colorado/edu/cai/public_html/papers/ras_v0.ps.
M. Chandy et al. (1996), The Caltech Archetypes/eText Project,http://www.etext.caltech.edu/.
D.E. Culler, J.P. Singh, and A. Gupta (1998), Parallel Computer Architecture,Morgan-Kaufman Press.
J.J. Dongarra (1997), Performance of Various Computers Using Standard Linear Equations Software, Technical Report CS-89–85 Computer Science Dept., Univ. of Tennessee, Knoxville. http://www.netlib.org/benchmark/performance.ps/benchmark/performance.ps
J. Erhel (1995), A parallel GMRES version for general sparse matrices, ETNA, 3:160–176.
W.D. Gropp, L.C. Mcinnes, M.D. Tidriri, and D.E. Keyes (1997), Parallel Implicit PDE Computations: Algorithms and Software, in Proceedings of Parallel CFD’97, A. Ecer et al., eds., Elsevier. http://www.icase.edu/~keyes/papers/pcfd97.ps/~keyes/papers/pcfd97.ps
W.D. Gropp, D.K. Kaushik, D.E. Keyes, and B.F. Smith (1999), Towards Realistic Performance Bounds for Implicit CFD Codes, in Proceedings of Parallel CFD’99, D. E. Keyes et al., eds., Elsevier (to appear). http://www.icase.edu/~keyes/papers/gkks.ps/~keyes/papers/gkks.ps
M.M. Grote and T. Huckle (1997), Parallel Preconditioning with Sparse Approximate Inverses, SIAM J. Sci. comput., 18:838–853. http://www-sccm.stanford.edu/Students/grote/grote/spai.ps.gz/Students/grote/grote/spai.ps.gz.
M.E. Hayder, D.E. Keyes, and P. Mehrotra (1997), A Comparison of PETSc Library and HIPF Implementations of an Archetypal PDE Computation, Advances in Engineering Software, 29:415–424. http://www.icase.edu/~keyes/papers/nasa97.ps/~keyes/papers/nasa97.ps
G. Horton (1994), Time-parallelism for the massively parallel solution of parabolic PDEs, in “Applications of High Performance Computers in Science and Engineering,” D. Power, ed., Computational Mechanics Publications, Southampton (UK).
G. Horton, S. Vandewalle, and P.H. Worley (1995), An algorithm with polylog parallel complexity for solving parabolic partial differential equations, SIAM J. Sci. Stat. Comput., 16:531–541.
D.K. Kaushik, D.E. Keyes, and B.F. Smith (1998), On the Interaction of Architecture and Algorithm in the Domain-Based Parallelization of an Unstructured Grid Incompressible Flow Code, in Proceedings of the Tenth International Conference on Domain Decomposition Methods (J. Mandel et al., eds.), Wiley, New York. http://www.icase.edu/~keyes/papers/kks_ddl0.ps/~keyes/papers/kks_ddl0.ps
D.K. Kaushik, D.E. Keyes, and B.F. Smith (1999), NKS Methods for Compressible and Incompressible Flows on Unstructured Grids, in Proceedings of the Eleventh International Conference on Domain Decomposition Methods (C.-H. Lai et al., eds.), Domain Decomposition Press, Bergen. http://www.icase.edu/~keyes/papers/kks_ddli.ps/~keyes/papers/kks_ddli.ps
C.T. Kelley and D.E. Keyes (1998), Convergence Analysis of Pseudo-Transient Continuation, SIAM J. Num. Anal., 35:508–523; also ICASE TR 96–46. ftp://ftp.icase.edu/pub/techreports/96/96-46.ps/pub/techreports/96/96-46.ps
D.E. Keyes (1999), Trends in Algorithms for Nonuniform Applications on Hierarchical Distributed Architectures,in Proceedings of the Workshop on Computational Aerosciences for the 21st Century, M. D. Salas and W. K. Anderson, eds., Elsevier. http://vvu.icase.edu/~keyes/papers/cas.ps/~keyes/papers/cas.ps
D.E. Keyes and B.F. Smith (1997), Final Report on “A Workshop on Parallel Unstructured Grid Computations”, NASA Contract Report NAS1–19858–92. http://uuv.icase.edu/~keyes/unstr.ps/~keyes/unstr.ps
P.M. Kogge, J.B. Brockman, V. Freeh, and S.C. Bass (1997), Petaflops, Algorithms, and PIMs, in Proceedings of the 1997 Petaflops Algorithms Workshop.
W. Mulder and B. Van leer (1985), Experiments with Implicit Upwind Methods for the Euler Equations, J. Comp. Phys., 59:232–246.
Y. Saad and M.H. Schultz (1986), GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems, SIAM J. Sci. Stat. Comput., 7:865–869.
B.F. Smith, P.E. Bjorstad, and W.D. Gropp (1996), Domain Decomposition: Parallel Multilevel Algorithms for Elliptic Partial Differential Equations, Cambridge Univ. Press.
R.W. Stevens (1997), Hardware Projects for COTS-based Designs,in Proceedings of the 1997 Petalops Algorithms Workshop.
T. Sterling (1997), Hebrid Technology Multithreaded Architecture: Issues for Algorithms and Programming, in Proceedings of the 1997 Petalops Algorithms Workshop.
T. Sterling, P. Messina, and P.H. Smith (1995), Enabling Technologies for Petaflops Computing, MIT Press.
J.C. Yan (1997), By Hand or Not By Hand — A Case Study of Computer Aided Parallelization Tools for CFD Applications, in Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications“ (H.R. Arabnia, ed.), CSREA, pp. 364–372.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer Science+Business Media New York
About this paper
Cite this paper
Keyes, D.E., Kaushik, D.K., Smith, B.F. (2000). Prospects for CFD on Petaflops Systems. In: Bjørstad, P., Luskin, M. (eds) Parallel Solution of Partial Differential Equations. The IMA Volumes in Mathematics and its Applications, vol 120. Springer, New York, NY. https://doi.org/10.1007/978-1-4612-1176-1_11
Download citation
DOI: https://doi.org/10.1007/978-1-4612-1176-1_11
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4612-7034-8
Online ISBN: 978-1-4612-1176-1
eBook Packages: Springer Book Archive