Abstract
All scalable parallel computers feature a memory hierarchy, in which some locations are “closer” to a particular processor than others. The hardware in a particular system may support a shared memory or message passing programming model, but these factors effect only the relative costs of local and remote accesses, not the system’s fundamental Non-Uniform Memory Access (NUMA) characteristics. Yet while the efficient management of memory hierarchies is fundamental to high performance in scientific computing, existing parallel languages and tools provide only limited support for this management task. Recognizing this deficiency, we propose abstractions and programming tools that can facilitate the explicit management of memory hierarchies by the programmer, and hence the efficient programming of scalable parallel computers. The abstractions comprise local arrays, global (distributed) arrays, and disk resident arrays located on secondary storage. The tools comprise the Global Arrays library, which supports the transfer of data between local and global arrays, and the Disk Resident Arrays (DRA) library, for transferring data between global and disk resident arrays. We describe the shared memory NUMA model implemented in the tools, discuss extensions for wide area computing environments, and review major applications of the tools, which currently total over one million lines of code.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
C. Seitz. High-performance workstations + high-speed interconnect ≥ multicomputers. In Scalable Parallel Libraries Conf., Missisippi State, 1993.
E.F. Van der Velde. Book review on ‘Studies in Computational Science: Parallel Programming Paradigms’. IEEE Computational Science and Engineering,2(4):85–87, 1995.
D. Patterson and J. Hennessy. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 1990.
T. Sterling, P. Messina, and P. Smith. Enabling Technologies for Petatops Software. MIT Press, 1995.
I. Foster, R. Olson, and S. Tuecke. Productive parallel programming: The PCN approach. Scientific Programming, 1(1):51–66, 1992.
P. Hatcher and §®. Quinn. Data-Parallel Programming on MIMD Computers. MITPress, 1991.
C. Koelbel, D. Loveman,R. Schreiber, G. S. Jr., and M. E. Zosel. The High Performance Fortran Handbook. The MIT Press, Cambridge, MA, 1994.
Message Passing Interface Forum. MPI: A Message-Passing Interface. University of Tennessee, Knoxville, Ten., May 5, 1994.
D. Culler, A. Dusseau, S. Goldstein, A. Krishnamurthy, T. v. E. S. Lumetta, and K. Yelick. Parallel programming in Split-C. In Proc. Supercomputing’93, pages 262–273. ACM Press, 1993.
K. Chandy and C. Kesselman. CC++: A declarative concurrent object oriented programming notation. In Research Directions in Object Oriented Programming. The MIT Press, Cambridge, MA, 1993.
C. Hoare. Communicating sequential processes. Communications of the ACM, 21(8):666–677, 1978.
M. Stumm and S. Zhou. Algorithms for implementing distributed shared memory. IEEE Computer, 24(5):54–64, 1990.
High Performance Fortran Forum. High Performance Fortran language specification, version 1.0. Technical Report CRPC-TR92225, Center for Research on Parallel Computation, Rice University, Houston, Tex., 1993.
H. Kung. Synchronized and asynchronous parallel algorithms for multiprocessors. In J. Traub, editor, Algorithms and Complexity, pages 153–200. Academic Press, 1976.
C. Amza, A. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. Tread-marks: Shared memory computing on networks of workstations. IEEE Computer, 29(2):18–28, 1996.
B. Bershad, M. Zekauskas, and W. Sawdon. The Midway distributed shared memory system. In Proc. ‘93 CompCon Conference, pages 528–537, 1993.
N. Carriero and D. Gelernter. How To Write Parallel Programs. A First Course. The MIT Press, Cambridge, Mass., 1990.
H. Bal, M. Kaashoek, and A. Tanenbaum. Orca: A language for parallel programming of distributed systems. IEEE Trans. Software Eng., 18(3):190–205, 1992.
J. Nieplocha, R. Harrison, and R. Littlefield. Global Arrays: A portable ‘shared-memory” programming model for distributed memory computers. In Proceedings of Supercomputing 1994, pages 340–349. IEEE Computer Society Press, 1994.
E. D’Azevedo and C. Romine. DOLIB: Distributed object library. Technical Report ORNL/TM-12744, Oak Ridge National Lab., Oak Ridge, TN, 1994.
R.H., Saavedra, R. Gaines, and M. Carlton. Micro benchmark analysis of the KSR1. In Proceedings of Supercomputing 93,pages 202–213. IEEE Computer Society, 1993.
Convex Computer Corp. Exemplar 5PPI00O/120O Architecture. Convex Computer Corp., Richardson, Tex., 1995.
MPI Forum. MPI-2. information available from http://www.mcs.anl.gov/mpi.
J. Nieplocha, R. Harrison, and R. Littlefield. Global Arrays: A nonuniform memory access programming model for high-performance computers. The Journal of Supercomputing, 10:197–220, 1996.
J. Nieplocha and I. Foster. Disk Resident Arrays: An array-oriented I/O library for out-of-core computations. In Proceedings of Frontiers of Massively Parallel Computation. IEEE Computer Society Press, 1996.
J. Nieplocha and R. Harrison. Shared-memory NUMA programming on I-WAY. In Proc. of IEEE HPDC5, pages 432–441. IEEE Computer Society Press, 1996.
D. Bernholdt et al. Parallel computational chemistry made easier: The development of NWChem. Intl J. Quantum Chem. Symp., 29:475–483, 1995.
I. Foster and C. Kesselman. Globus: A metacomputing infrastructure toolkit. In Proc. 3rd Workshop on Environments and Tools for Parallel Scientific Computing. SIAM, 1996. to appear; see also http://www.globus.org.
T. DeFanti, I. Foster, M. Papka, R. Stevens, and T. Kuhfuss. Overview of the I-WAY: Wide area visual supercomputing. Int. J. Supercomputing Applications,10(2):123–130, 1996.
S. Pakin, M. Launa, and A. Chien. High performance messaging on workstations: Illinois Fast Messages (FM) for Myrinet. In Proc. Supercomputing’95, 1995.
T. von Eicken, V. Avula, A. Basu, and V. Buch. Low-latency communication over ATM networks using active messages. IEEE Micro, 15(1):46–53, 1995.
M. Oguchi, H. Aida, and T. Saito. A proposal for a DSM architecture suitable for a widely distributed environment and its evaluation. In Proc. 4-th IEEE Int. Symp. HPDC. IEEE CS Press, 1995.
P. Corbett, D. Feitelson, S. Fineberg, Y. Hsu, W. Nitzberg, J.-P. Prost, M. Snir, B. Traversat, and P. Wong. Overview of the MPI-IO parallel I/O interface. In IPPS ‘95 Workshop on Input/Output in Parallel and Distributed Systems, pages 1–15, April 1995.
A. Choudhary, R. Bordawekar, M. Harry, R. Krishnaiyer, R. Ponnusamy, T. Singh, and R. Thakur. PASSION: Parallel and scalable software for input-output. Technical Report SOCS-636, NPAC, Syracuse, NY, 1994.
K. E. Seamons, Y. Chen, P. Jones, J. Jozwiak, and M. Winslett. Server-directed collective I/O in Panda. In Proc. Supercomputing ‘85, December 1995.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1997 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Nieplocha, J., Harrison, R., Foster, I. (1997). Explicit Management of Memory Hierarchy. In: Grandinetti, L., Kowalik, J., Vajtersic, M. (eds) Advances in High Performance Computing. NATO ASI Series, vol 30. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-5514-4_11
Download citation
DOI: https://doi.org/10.1007/978-94-011-5514-4_11
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-010-6322-7
Online ISBN: 978-94-011-5514-4
eBook Packages: Springer Book Archive