Trends in Algorithms for Nonuniform Applications on Hierarchical Distributed Architectures

Keyes, David E.

doi:10.1007/978-94-010-0948-5_6

David E. Keyes^4,5,6

Part of the book series: ICASE LaRC Interdisciplinary Series in Science and Engineering ((ICAS,volume 8))

179 Accesses
1 Citations

Abstract

Scientific programmers are accustomed to expressing in their programs the “who” (variable declarations) and the “what” (operations), in some sequentialized order, and leaving to the systems software and hardware the questions of “when” and “where”. This act of delegation is appropriate at the small scales, since programmer management of pipelines, multiple functional units, and multilevel caches is presently beyond reward, and the depth and complexity of such performance-motivated architectural developments are sure to increase. However, disregard for the differential costs of accessing different locations in memory (the “flat memory” model) can put unnecessary amounts of synchronization and data motion on the critical path of program execution. Different organization of algorithms leading to mathematically equivalent results can have very different levels of exposed synchronization and data motion, and algorithmicists of the future will have to be conscious of and adapt to the distributed and hierarchical aspects of memory architecture.

Plenty of examples of architecturally motivated algorithmic adaptations can be given today; we illustrate herein with examples from recent aerodynamics simulations. For this purpose, pseudo-transient Newton-KrylovSchwarz methods are briefly introduced and their parallel scalability in bulk synchronous SPMD applications is explored. We also indicate some fundamental limitations of bulk synchronous implicit solvers and propose asyn-chronous forms of nonlinear Schwarz methods as perhaps better adapted both to massively parallel architectures and strongly nonuniform applications. Suitably adapted PDE solvers seem to be readily extrapolated to the 100 Tflop/s capabilities envisioned in the corning decade. Making use of some novel quantitative metrics for the memory access efficiencies of high performance applications (“memtropy”) and for the local strength of nonlinearity (“tensoricity”) in applications with spatially nonuniform characteristics, we propose a migration path for scientific and engineering simulations towards the distributed and hierarchical Teraflops world, and we consider what simulations in this world will look like.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Balay, S., Gropp, W.D., McInnes, L.C. and Smith, B.F., 1998. PETSc 2.0 users manual,Technical Report ANL-95/11 - Revision 2.0.22, Argonne National Laboratory.
Google Scholar
Baudet, G.M., 1978. Asynchronous Iterative Methods for MultiprocessorsJ. of the ACM25.pp. 226–244.
Article MathSciNet MATH Google Scholar
Cai, X.C., Gropp, W.D., Keyes, D.E., Melvin, R.G. and Young, D.P., 1998. Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equationSIAM J. Scientific Computing 19 pp 246–265.
Article MathSciNet MATH Google Scholar
Chazan, D. and Miranker, W., 1969. Chaotic RelaxationLinear Algebra and Its Applications2, pp. 199--222.
Article MathSciNet MATH Google Scholar
Culler, D.E., Singh, J.P. and Gupta, A., 1998.Parallel Computer ArchitectureMorgan-Kaufman.
Google Scholar
Dennis, J.E. and Schnabel, R., 1973.Numerical Methods for Unconstrained Optimization and Nonlinear EquationsPrentice-Hall, 1973.
Google Scholar
Dennis, J.E. and Torczon, V., 1991. Direct search methods on parallel machinesSIAM J. Optimization1, pp. 448–474.
Article MathSciNet MATH Google Scholar
Department of Energy, 1998. Accelerated Strategic Computing Initiative, http://www.11nl.gov
Google Scholar
de Sturler, E. and van der Vorst, H.A., 1987. Reducing the Effect of Global Communication in GMRES(m) and CG on Parallel Distributed Memory ComputersApplied Numerical Mathematics18, pp. 441–459.
Article Google Scholar
Dryja, M. and Widlund, O.B., 1987. An Additive Variant of the Schwarz Alternating Method for the Case of Many Subregions, Technical Report #339, Courant Institute, NYU.
Google Scholar
Federal Coordinating Council For Science, Engineering, and Technology,1992. High Performance Computing and Communications Initiative.(See alsohttp://www.hpcc.gov.)
Gao, G.R., Theobald, K.B., Marquez, A. and Sterling, T. 1997. The HTMT Program Execution Model, CAPSL TM-09, ECE Department University of Delaware. [See alsohttp://htmt.cacr.caltech.edu/publicat.htm.]
Google Scholar
Gropp, W.D., Keyes, D.E., McInnes, L.C. and Tidriri, M.D., 1998. Globalized NewtonKrylov-Schwarz Algorithms and Software for Parallel Implicit CFD, ICASE Technical Report 98–24, 36 pp. [To appear inInt. J. for High Performance Comput. Applies.]
Google Scholar
Hestenes, M.R. and Stiefel, E., 1952. Methods of conjugate gradients for solving linear systemsJ. Res. Nat. Bur. Stand.49, pp. 409–435.
Article MathSciNet MATH Google Scholar
Hilbert, D., 1891. Uber die stetige Abbildung einer Linie auf ein FlächenstückMathematische Annalen38, pp. 459–460.
Article MathSciNet MATH Google Scholar
Ierotheou, C., Lai, C.H., Palansuriya, C.J. and Pericleous, K.A., 1998. Simulation of 2-D metal cutting by means of a distributed algorithmThe Computer Journal41, pp. 57–63.
Article MATH Google Scholar
Bilmes, J.Asanovie, K.Chin, C.W.and Demmel, J.1998.Optimizing Matrix Multiply Using PHiPAC: A Portable High-Performance ANSI C Methodology, inProceedings of the International Conference on SupercomputingVienna, Austria, July 1997 (ACM SIGARC)
Google Scholar
Karypis, G. and Kumar, V., 1998. Multilevel Algorithms for Multi-Constraint Graph Partitioning, Technical Report 98–019, CS Department, University of Minnesota.
Google Scholar
Kaushik, D.K., Keyes, D.E. and Smith, B.F., 1998. On the interaction of architecture and algorithm in the domain-based parallelization of an unstructured grid incompressible flow code, inProceedings of the Tenth International Conference on Domain Decomposition MethodsMandel, J. et al., eds., AMS, pp. 311–319.
Google Scholar
Kelley, C.T. and Keyes, D.E., 1998. Convergence analysis of pseudo-transient continuationSIAM J. Numerical Analysis35, pp. 508–523.
Article MathSciNet MATH Google Scholar
Keyes, D.E. and Gropp, W.D., 1987. A Comparison of Domain Decomposition Techniques for Elliptic Partial Differential Equations and Their Parallel ImplementationSIAM J. Scientific and Statistical Computing8, pp. s166-s202.
Article MathSciNet Google Scholar
Keyes, D.E., Kaushik, D.K. and Smith, B.F., 1998. Prospects for CFD on petaflops systems, inCFD Review 1997M. Hafez et al., eds., Wiley (to appear).
Google Scholar
Lai, C.-H., 1997. An application of quasi-Newton methods for the numerical solution of interface problemsAdvances in Engineering Software28, pp. 333–339.
Article Google Scholar
Lai, C.-H., Cuffe, A.M. and Pericleous, K.A., 1998. A defect equation approach for the coupling of subdomains in domain decomposition methodsComputers and Mathematics with Applications35, pp. 81–94.
Article MathSciNet MATH Google Scholar
Miellou, J.C., 1975. Itérations chaotiques a retardsComptes Rendus Ser. A280, pp. 233–236.
Google Scholar
National Science Foundation, 1996. Grand Challenges, National Challenges, Multidisciplinary Computing Challenges.http://www.cise.nsf.gov/general/grand_challenge.html.
Peano, G., 1890. Sur une courbe qui remplit toute une aire planeMathematische Annalen 36pp. 157–160.
Article MathSciNet MATH Google Scholar
Reid, J.K., 1971. On the Method of Conjugate Gradients for the Solution of Large Sparse Systems of Linear Equations, inLarge Sparse Sets of Linear EquationsJ.K. Reid, ed., Academic Press, pp. 231–254.
Google Scholar
Saad, Y. and Schultz, M.H., 1986. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systemsSIAM J. Scientific and Statistical Computing7, pp. 856–869.
Article MathSciNet MATH Google Scholar
Semiconductor Industry Association, 1998. The National Technology Roadmap for Semiconductors, 1997 Editionhttp://notes.sematech.org/97pelec.htm
Google Scholar
Smith, B.F., Bjorstad, P.E. and Gropp, W.D., 1996.Domain Decomposition MethodsCambridge University Press, Cambridge.
MATH Google Scholar
Schwarz, H.A., 1890. Uber einen Grenzubergang durch Alternierenden Verfahren, [originally published in 1869] inGesammelte Mathematische Abhandlungen2, Springer Verlag, pp. 133–134
Google Scholar
Soderlind, G., 1998. The Automatic Control of Numerical Integration, Technical Report LU-CS-TR:98–200, Lund Institute of Technology, Lund, Sweden.
Google Scholar
Tseng, P., Bertsekas, D.P., and Tsitsiklis, J.N., 1990. Partially Asynchronous Parallel Algorithms for Network Flow and Other ProblemsSIAM J. Control and Optimization28, pp. 678–710.
Article MathSciNet MATH Google Scholar
Warren, M. and Salmon, J., 1995. A parallel, portable and versatile treecode, inSeventh SIAM Conference on Parallel Processing for Scientific ComputingSIAM, Philadelphia, pp. 319–324.
Google Scholar
Warren, M., Salmon, J.K., Becker, D.J., Goda, M.P., Sterling, T. and Winckelmans, G.S., 1997. Pentium Pro inside: I. A treecode at 430 Gigaflops on ASCI Red. II. Price/performance of $50/Mflop on Loki and Hyglac, inSupercomputing ‘87IEEE Computer Society, Los Alamitos.
Google Scholar
Whaley, R.C. and Dongarra, J., 1998. Automatically Tuned Linear Algebra Softwarehttp://www.netlib.org/atlas/index.html
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics & Statistics, Old Dominion University, Norfolk, Virginia
David E. Keyes
Institute for Scientific Computing Research, Lawrence Livermore National Laboratory, Livermore, California
David E. Keyes
Institute for Computer Applications in Science and Engineering, NASA Langley Research Center, Hampton, Virginia
David E. Keyes

Authors

David E. Keyes
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ICASE, NASA Langley Research Center, Hampton, VA, USA
Manuel D. Salas
NASA Langley Research Center, Hampton, VA, USA
W. Kyle Anderson

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Keyes, D.E. (2000). Trends in Algorithms for Nonuniform Applications on Hierarchical Distributed Architectures. In: Salas, M.D., Anderson, W.K. (eds) Computational Aerosciences in the 21st Century. ICASE LaRC Interdisciplinary Series in Science and Engineering, vol 8. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0948-5_6

Download citation

DOI: https://doi.org/10.1007/978-94-010-0948-5_6
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-3807-2
Online ISBN: 978-94-010-0948-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics