Skip to main content

Explicit Management of Memory Hierarchy

  • Chapter
Advances in High Performance Computing

Part of the book series: NATO ASI Series ((ASHT,volume 30))

  • 134 Accesses

Abstract

All scalable parallel computers feature a memory hierarchy, in which some locations are “closer” to a particular processor than others. The hardware in a particular system may support a shared memory or message passing programming model, but these factors effect only the relative costs of local and remote accesses, not the system’s fundamental Non-Uniform Memory Access (NUMA) characteristics. Yet while the efficient management of memory hierarchies is fundamental to high performance in scientific computing, existing parallel languages and tools provide only limited support for this management task. Recognizing this deficiency, we propose abstractions and programming tools that can facilitate the explicit management of memory hierarchies by the programmer, and hence the efficient programming of scalable parallel computers. The abstractions comprise local arrays, global (distributed) arrays, and disk resident arrays located on secondary storage. The tools comprise the Global Arrays library, which supports the transfer of data between local and global arrays, and the Disk Resident Arrays (DRA) library, for transferring data between global and disk resident arrays. We describe the shared memory NUMA model implemented in the tools, discuss extensions for wide area computing environments, and review major applications of the tools, which currently total over one million lines of code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. Seitz. High-performance workstations + high-speed interconnect ≥ multicomputers. In Scalable Parallel Libraries Conf., Missisippi State, 1993.

    Google Scholar 

  2. E.F. Van der Velde. Book review on ‘Studies in Computational Science: Parallel Programming Paradigms’. IEEE Computational Science and Engineering,2(4):85–87, 1995.

    Article  Google Scholar 

  3. D. Patterson and J. Hennessy. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, 1990.

    Google Scholar 

  4. T. Sterling, P. Messina, and P. Smith. Enabling Technologies for Petatops Software. MIT Press, 1995.

    Google Scholar 

  5. I. Foster, R. Olson, and S. Tuecke. Productive parallel programming: The PCN approach. Scientific Programming, 1(1):51–66, 1992.

    Google Scholar 

  6. P. Hatcher and §®. Quinn. Data-Parallel Programming on MIMD Computers. MITPress, 1991.

    MATH  Google Scholar 

  7. C. Koelbel, D. Loveman,R. Schreiber, G. S. Jr., and M. E. Zosel. The High Performance Fortran Handbook. The MIT Press, Cambridge, MA, 1994.

    Google Scholar 

  8. Message Passing Interface Forum. MPI: A Message-Passing Interface. University of Tennessee, Knoxville, Ten., May 5, 1994.

    Google Scholar 

  9. D. Culler, A. Dusseau, S. Goldstein, A. Krishnamurthy, T. v. E. S. Lumetta, and K. Yelick. Parallel programming in Split-C. In Proc. Supercomputing’93, pages 262–273. ACM Press, 1993.

    Google Scholar 

  10. K. Chandy and C. Kesselman. CC++: A declarative concurrent object oriented programming notation. In Research Directions in Object Oriented Programming. The MIT Press, Cambridge, MA, 1993.

    Google Scholar 

  11. C. Hoare. Communicating sequential processes. Communications of the ACM, 21(8):666–677, 1978.

    Article  MathSciNet  MATH  Google Scholar 

  12. M. Stumm and S. Zhou. Algorithms for implementing distributed shared memory. IEEE Computer, 24(5):54–64, 1990.

    Article  Google Scholar 

  13. High Performance Fortran Forum. High Performance Fortran language specification, version 1.0. Technical Report CRPC-TR92225, Center for Research on Parallel Computation, Rice University, Houston, Tex., 1993.

    Google Scholar 

  14. H. Kung. Synchronized and asynchronous parallel algorithms for multiprocessors. In J. Traub, editor, Algorithms and Complexity, pages 153–200. Academic Press, 1976.

    Google Scholar 

  15. C. Amza, A. Cox, S. Dwarkadas, P. Keleher, H. Lu, R. Rajamony, W. Yu, and W. Zwaenepoel. Tread-marks: Shared memory computing on networks of workstations. IEEE Computer, 29(2):18–28, 1996.

    Article  Google Scholar 

  16. B. Bershad, M. Zekauskas, and W. Sawdon. The Midway distributed shared memory system. In Proc. ‘93 CompCon Conference, pages 528–537, 1993.

    Google Scholar 

  17. N. Carriero and D. Gelernter. How To Write Parallel Programs. A First Course. The MIT Press, Cambridge, Mass., 1990.

    Google Scholar 

  18. H. Bal, M. Kaashoek, and A. Tanenbaum. Orca: A language for parallel programming of distributed systems. IEEE Trans. Software Eng., 18(3):190–205, 1992.

    Article  Google Scholar 

  19. J. Nieplocha, R. Harrison, and R. Littlefield. Global Arrays: A portable ‘shared-memory” programming model for distributed memory computers. In Proceedings of Supercomputing 1994, pages 340–349. IEEE Computer Society Press, 1994.

    Google Scholar 

  20. E. D’Azevedo and C. Romine. DOLIB: Distributed object library. Technical Report ORNL/TM-12744, Oak Ridge National Lab., Oak Ridge, TN, 1994.

    Google Scholar 

  21. R.H., Saavedra, R. Gaines, and M. Carlton. Micro benchmark analysis of the KSR1. In Proceedings of Supercomputing 93,pages 202–213. IEEE Computer Society, 1993.

    Google Scholar 

  22. Convex Computer Corp. Exemplar 5PPI00O/120O Architecture. Convex Computer Corp., Richardson, Tex., 1995.

    Google Scholar 

  23. MPI Forum. MPI-2. information available from http://www.mcs.anl.gov/mpi.

  24. J. Nieplocha, R. Harrison, and R. Littlefield. Global Arrays: A nonuniform memory access programming model for high-performance computers. The Journal of Supercomputing, 10:197–220, 1996.

    Article  Google Scholar 

  25. J. Nieplocha and I. Foster. Disk Resident Arrays: An array-oriented I/O library for out-of-core computations. In Proceedings of Frontiers of Massively Parallel Computation. IEEE Computer Society Press, 1996.

    Google Scholar 

  26. J. Nieplocha and R. Harrison. Shared-memory NUMA programming on I-WAY. In Proc. of IEEE HPDC5, pages 432–441. IEEE Computer Society Press, 1996.

    Google Scholar 

  27. D. Bernholdt et al. Parallel computational chemistry made easier: The development of NWChem. Intl J. Quantum Chem. Symp., 29:475–483, 1995.

    Article  Google Scholar 

  28. I. Foster and C. Kesselman. Globus: A metacomputing infrastructure toolkit. In Proc. 3rd Workshop on Environments and Tools for Parallel Scientific Computing. SIAM, 1996. to appear; see also http://www.globus.org.

    Google Scholar 

  29. T. DeFanti, I. Foster, M. Papka, R. Stevens, and T. Kuhfuss. Overview of the I-WAY: Wide area visual supercomputing. Int. J. Supercomputing Applications,10(2):123–130, 1996.

    Article  Google Scholar 

  30. S. Pakin, M. Launa, and A. Chien. High performance messaging on workstations: Illinois Fast Messages (FM) for Myrinet. In Proc. Supercomputing’95, 1995.

    Google Scholar 

  31. T. von Eicken, V. Avula, A. Basu, and V. Buch. Low-latency communication over ATM networks using active messages. IEEE Micro, 15(1):46–53, 1995.

    Article  Google Scholar 

  32. M. Oguchi, H. Aida, and T. Saito. A proposal for a DSM architecture suitable for a widely distributed environment and its evaluation. In Proc. 4-th IEEE Int. Symp. HPDC. IEEE CS Press, 1995.

    Google Scholar 

  33. P. Corbett, D. Feitelson, S. Fineberg, Y. Hsu, W. Nitzberg, J.-P. Prost, M. Snir, B. Traversat, and P. Wong. Overview of the MPI-IO parallel I/O interface. In IPPS ‘95 Workshop on Input/Output in Parallel and Distributed Systems, pages 1–15, April 1995.

    Google Scholar 

  34. A. Choudhary, R. Bordawekar, M. Harry, R. Krishnaiyer, R. Ponnusamy, T. Singh, and R. Thakur. PASSION: Parallel and scalable software for input-output. Technical Report SOCS-636, NPAC, Syracuse, NY, 1994.

    Google Scholar 

  35. K. E. Seamons, Y. Chen, P. Jones, J. Jozwiak, and M. Winslett. Server-directed collective I/O in Panda. In Proc. Supercomputing ‘85, December 1995.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Nieplocha, J., Harrison, R., Foster, I. (1997). Explicit Management of Memory Hierarchy. In: Grandinetti, L., Kowalik, J., Vajtersic, M. (eds) Advances in High Performance Computing. NATO ASI Series, vol 30. Springer, Dordrecht. https://doi.org/10.1007/978-94-011-5514-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-94-011-5514-4_11

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-010-6322-7

  • Online ISBN: 978-94-011-5514-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics