Skip to main content

Trends in Algorithms for Nonuniform Applications on Hierarchical Distributed Architectures

  • Conference paper
Computational Aerosciences in the 21st Century

Part of the book series: ICASE LaRC Interdisciplinary Series in Science and Engineering ((ICAS,volume 8))

Abstract

Scientific programmers are accustomed to expressing in their programs the “who” (variable declarations) and the “what” (operations), in some sequentialized order, and leaving to the systems software and hardware the questions of “when” and “where”. This act of delegation is appropriate at the small scales, since programmer management of pipelines, multiple functional units, and multilevel caches is presently beyond reward, and the depth and complexity of such performance-motivated architectural developments are sure to increase. However, disregard for the differential costs of accessing different locations in memory (the “flat memory” model) can put unnecessary amounts of synchronization and data motion on the critical path of program execution. Different organization of algorithms leading to mathematically equivalent results can have very different levels of exposed synchronization and data motion, and algorithmicists of the future will have to be conscious of and adapt to the distributed and hierarchical aspects of memory architecture.

Plenty of examples of architecturally motivated algorithmic adaptations can be given today; we illustrate herein with examples from recent aerodynamics simulations. For this purpose, pseudo-transient Newton-KrylovSchwarz methods are briefly introduced and their parallel scalability in bulk synchronous SPMD applications is explored. We also indicate some fundamental limitations of bulk synchronous implicit solvers and propose asyn-chronous forms of nonlinear Schwarz methods as perhaps better adapted both to massively parallel architectures and strongly nonuniform applications. Suitably adapted PDE solvers seem to be readily extrapolated to the 100 Tflop/s capabilities envisioned in the corning decade. Making use of some novel quantitative metrics for the memory access efficiencies of high performance applications (“memtropy”) and for the local strength of nonlinearity (“tensoricity”) in applications with spatially nonuniform characteristics, we propose a migration path for scientific and engineering simulations towards the distributed and hierarchical Teraflops world, and we consider what simulations in this world will look like.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Balay, S., Gropp, W.D., McInnes, L.C. and Smith, B.F., 1998. PETSc 2.0 users manual,Technical Report ANL-95/11 - Revision 2.0.22, Argonne National Laboratory.

    Google Scholar 

  • Baudet, G.M., 1978. Asynchronous Iterative Methods for MultiprocessorsJ. of the ACM25.pp. 226–244.

    Article  MathSciNet  MATH  Google Scholar 

  • Cai, X.C., Gropp, W.D., Keyes, D.E., Melvin, R.G. and Young, D.P., 1998. Parallel Newton-Krylov-Schwarz algorithms for the transonic full potential equationSIAM J. Scientific Computing 19 pp 246–265.

    Article  MathSciNet  MATH  Google Scholar 

  • Chazan, D. and Miranker, W., 1969. Chaotic RelaxationLinear Algebra and Its Applications2, pp. 199--222.

    Article  MathSciNet  MATH  Google Scholar 

  • Culler, D.E., Singh, J.P. and Gupta, A., 1998.Parallel Computer ArchitectureMorgan-Kaufman.

    Google Scholar 

  • Dennis, J.E. and Schnabel, R., 1973.Numerical Methods for Unconstrained Optimization and Nonlinear EquationsPrentice-Hall, 1973.

    Google Scholar 

  • Dennis, J.E. and Torczon, V., 1991. Direct search methods on parallel machinesSIAM J. Optimization1, pp. 448–474.

    Article  MathSciNet  MATH  Google Scholar 

  • Department of Energy, 1998. Accelerated Strategic Computing Initiative, http://www.11nl.gov

    Google Scholar 

  • de Sturler, E. and van der Vorst, H.A., 1987. Reducing the Effect of Global Communication in GMRES(m) and CG on Parallel Distributed Memory ComputersApplied Numerical Mathematics18, pp. 441–459.

    Article  Google Scholar 

  • Dryja, M. and Widlund, O.B., 1987. An Additive Variant of the Schwarz Alternating Method for the Case of Many Subregions, Technical Report #339, Courant Institute, NYU.

    Google Scholar 

  • Federal Coordinating Council For Science, Engineering, and Technology,1992. High Performance Computing and Communications Initiative.(See alsohttp://www.hpcc.gov.)

  • Gao, G.R., Theobald, K.B., Marquez, A. and Sterling, T. 1997. The HTMT Program Execution Model, CAPSL TM-09, ECE Department University of Delaware. [See alsohttp://htmt.cacr.caltech.edu/publicat.htm.]

    Google Scholar 

  • Gropp, W.D., Keyes, D.E., McInnes, L.C. and Tidriri, M.D., 1998. Globalized NewtonKrylov-Schwarz Algorithms and Software for Parallel Implicit CFD, ICASE Technical Report 98–24, 36 pp. [To appear inInt. J. for High Performance Comput. Applies.]

    Google Scholar 

  • Hestenes, M.R. and Stiefel, E., 1952. Methods of conjugate gradients for solving linear systemsJ. Res. Nat. Bur. Stand.49, pp. 409–435.

    Article  MathSciNet  MATH  Google Scholar 

  • Hilbert, D., 1891. Uber die stetige Abbildung einer Linie auf ein FlächenstückMathematische Annalen38, pp. 459–460.

    Article  MathSciNet  MATH  Google Scholar 

  • Ierotheou, C., Lai, C.H., Palansuriya, C.J. and Pericleous, K.A., 1998. Simulation of 2-D metal cutting by means of a distributed algorithmThe Computer Journal41, pp. 57–63.

    Article  MATH  Google Scholar 

  • Bilmes, J.Asanovie, K.Chin, C.W.and Demmel, J.1998.Optimizing Matrix Multiply Using PHiPAC: A Portable High-Performance ANSI C Methodology, inProceedings of the International Conference on SupercomputingVienna, Austria, July 1997 (ACM SIGARC)

    Google Scholar 

  • Karypis, G. and Kumar, V., 1998. Multilevel Algorithms for Multi-Constraint Graph Partitioning, Technical Report 98–019, CS Department, University of Minnesota.

    Google Scholar 

  • Kaushik, D.K., Keyes, D.E. and Smith, B.F., 1998. On the interaction of architecture and algorithm in the domain-based parallelization of an unstructured grid incompressible flow code, inProceedings of the Tenth International Conference on Domain Decomposition MethodsMandel, J. et al., eds., AMS, pp. 311–319.

    Google Scholar 

  • Kelley, C.T. and Keyes, D.E., 1998. Convergence analysis of pseudo-transient continuationSIAM J. Numerical Analysis35, pp. 508–523.

    Article  MathSciNet  MATH  Google Scholar 

  • Keyes, D.E. and Gropp, W.D., 1987. A Comparison of Domain Decomposition Techniques for Elliptic Partial Differential Equations and Their Parallel ImplementationSIAM J. Scientific and Statistical Computing8, pp. s166-s202.

    Article  MathSciNet  Google Scholar 

  • Keyes, D.E., Kaushik, D.K. and Smith, B.F., 1998. Prospects for CFD on petaflops systems, inCFD Review 1997M. Hafez et al., eds., Wiley (to appear).

    Google Scholar 

  • Lai, C.-H., 1997. An application of quasi-Newton methods for the numerical solution of interface problemsAdvances in Engineering Software28, pp. 333–339.

    Article  Google Scholar 

  • Lai, C.-H., Cuffe, A.M. and Pericleous, K.A., 1998. A defect equation approach for the coupling of subdomains in domain decomposition methodsComputers and Mathematics with Applications35, pp. 81–94.

    Article  MathSciNet  MATH  Google Scholar 

  • Miellou, J.C., 1975. Itérations chaotiques a retardsComptes Rendus Ser. A280, pp. 233–236.

    Google Scholar 

  • National Science Foundation, 1996. Grand Challenges, National Challenges, Multidisciplinary Computing Challenges.http://www.cise.nsf.gov/general/grand_challenge.html.

  • Peano, G., 1890. Sur une courbe qui remplit toute une aire planeMathematische Annalen 36pp. 157–160.

    Article  MathSciNet  MATH  Google Scholar 

  • Reid, J.K., 1971. On the Method of Conjugate Gradients for the Solution of Large Sparse Systems of Linear Equations, inLarge Sparse Sets of Linear EquationsJ.K. Reid, ed., Academic Press, pp. 231–254.

    Google Scholar 

  • Saad, Y. and Schultz, M.H., 1986. GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systemsSIAM J. Scientific and Statistical Computing7, pp. 856–869.

    Article  MathSciNet  MATH  Google Scholar 

  • Semiconductor Industry Association, 1998. The National Technology Roadmap for Semiconductors, 1997 Editionhttp://notes.sematech.org/97pelec.htm

    Google Scholar 

  • Smith, B.F., Bjorstad, P.E. and Gropp, W.D., 1996.Domain Decomposition MethodsCambridge University Press, Cambridge.

    MATH  Google Scholar 

  • Schwarz, H.A., 1890. Uber einen Grenzubergang durch Alternierenden Verfahren, [originally published in 1869] inGesammelte Mathematische Abhandlungen2, Springer Verlag, pp. 133–134

    Google Scholar 

  • Soderlind, G., 1998. The Automatic Control of Numerical Integration, Technical Report LU-CS-TR:98–200, Lund Institute of Technology, Lund, Sweden.

    Google Scholar 

  • Tseng, P., Bertsekas, D.P., and Tsitsiklis, J.N., 1990. Partially Asynchronous Parallel Algorithms for Network Flow and Other ProblemsSIAM J. Control and Optimization28, pp. 678–710.

    Article  MathSciNet  MATH  Google Scholar 

  • Warren, M. and Salmon, J., 1995. A parallel, portable and versatile treecode, inSeventh SIAM Conference on Parallel Processing for Scientific ComputingSIAM, Philadelphia, pp. 319–324.

    Google Scholar 

  • Warren, M., Salmon, J.K., Becker, D.J., Goda, M.P., Sterling, T. and Winckelmans, G.S., 1997. Pentium Pro inside: I. A treecode at 430 Gigaflops on ASCI Red. II. Price/performance of $50/Mflop on Loki and Hyglac, inSupercomputing ‘87IEEE Computer Society, Los Alamitos.

    Google Scholar 

  • Whaley, R.C. and Dongarra, J., 1998. Automatically Tuned Linear Algebra Softwarehttp://www.netlib.org/atlas/index.html

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2000 Springer Science+Business Media Dordrecht

About this paper

Cite this paper

Keyes, D.E. (2000). Trends in Algorithms for Nonuniform Applications on Hierarchical Distributed Architectures. In: Salas, M.D., Anderson, W.K. (eds) Computational Aerosciences in the 21st Century. ICASE LaRC Interdisciplinary Series in Science and Engineering, vol 8. Springer, Dordrecht. https://doi.org/10.1007/978-94-010-0948-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-94-010-0948-5_6

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-3807-2

  • Online ISBN: 978-94-010-0948-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics