High-Performance Memory Systems

  • Amos R. Omondi


A high-performance pipelined machine requires a memory system of equally high performance if the processor is to be fully utilized. This chapter deals with the issue of matching processor and memory performances. In a simple machine in which a single main memory module is connected directly to a processor (with no intermediate storage), the effective rate at which operands and instructions can be processed is limited by the rate at which the memory unit can deliver them. But in a high-performance machine, the access time of a single main memory unit typically exceeds the cycle time of the processor by a large margin, and obtaining the highest performance possible requires some changes in the basic design of the memory system. The development of faster memory as a solution to problem is not viable due to limitations in the technology that is available within a given period: the performance of the technology used for main memories typically improves at a rate that is less than that used processors, and the use for main memory of the fastest available logic is never cost-effective.


Main Memory Memory Location Cache Line Memory Bank Instruction Cache 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    AMD. 1997. AMD-K6 MMX Processor. Advanced Micro Devices, Sunnyvale, California.Google Scholar
  2. 2.
    Alpert, D.B. and M.J. Flynn. 1988. Performance trade-offs for microprocessor cache memories. IEEE Micro, 8(4):44–54.CrossRefGoogle Scholar
  3. 3.
    Boland, L.J., G.D. Granito, A.U. Marcotte, B.U. Messina, and J.W. Smith. 1967. The IBM System/360 model 91: storage system. IBM Journal of Research and Development, 11(1):54–68. zbMATHCrossRefGoogle Scholar
  4. 4.
    Budnik, P.P. and D.J. Kuck. 1971. The organization and use of parallel memories. IEEE Transactions on Computers, C-20(12):1566–1569.CrossRefGoogle Scholar
  5. 5.
    Burnett, G.J. and E.G. Coffman. 1975. Analysis of interleaved memory systems using blockage buffers. Communications of the ACM, 18(2):91–95.zbMATHCrossRefGoogle Scholar
  6. 6.
    CDC 1987. Cyber 200 Model 205 Computer System. Hardware Reference Manual. Control Data Corporation, Minneapolis, Minnesota, USA.Google Scholar
  7. 7.
    Diefendorff, K. and M. Allen, 1992. Organization of the Motorola 88110 superscalar RISC microprocessor. IEEE Micro, 12 (4):40–63.CrossRefGoogle Scholar
  8. 8.
    Edmondson, J.H., P. Rubinfield, R. Preston, and V. Rajagopalan. 1995. Superscalar instruction execution in the Alpha 21164 microprocessor. IEEE Micro, 15(2):33–43.CrossRefGoogle Scholar
  9. 9.
    Fite, D.B., T. Fossum, and D. Manley. 1990. Design strategies for the VAX 9000 system. Digital Technical Journal, 2(4):13–24.Google Scholar
  10. 10.
    Hill, M.D. 1987. Aspects of Cache Memory and Instruction Buffer Performance. Technical Report No. UCB/CSD 87/382, Computer Science Division, University of California, Berkeley.Google Scholar
  11. 11.
    Hill, M.D. 1988. A case for direct-mapped caches. IEEE Computer, December.Google Scholar
  12. 12.
    Hsu, W.-C. and J.E. Smith. 1998. A performance study of cache prefetching methods. IEEE Transactions on Computers, 47(5):497–508.CrossRefGoogle Scholar
  13. 13.
    Inayoshi et al. 1988. Realization of the Gmicro/200. IEEE Micro, 8(2):12–21.CrossRefGoogle Scholar
  14. 14.
    Jouppi, N. 1990. Improving direct-mapped cache performance by addition of a small fully-associative cache and prefetch buffers. In: 17th International Symposium on Computer Architecture, pp 364–373.Google Scholar
  15. 15.
    Jouppi, N.P. 1993. Cache write policies and performance. In: Proceedings, 20th International Symposium on Computer Architecture, pp 191–201.CrossRefGoogle Scholar
  16. 16.
    Jouppi, N.P. and S.J.E. Wilton. 1994. Trade-offs in two-level on-chip caching. In: Proceedings, 21st International Symposium on Computer Architecture, pp 34–45.CrossRefGoogle Scholar
  17. 17.
    Kroft, D. 1981. Lockup-free instruction fetch/prefetch cache organization. In: Proceedings, 8th International Symposium on Computer Architecture, pp 81–85.Google Scholar
  18. 18.
    Lawrie, D., and C. Vora. 1982. The prime memory system for array access. IEEE Transactions on Computers, 31(5):435–432.zbMATHCrossRefGoogle Scholar
  19. 19.
    Matick, R, R. Mao, and S. Ray. 1989. Architecture, design, and operating characteristics of a 12ns CMOS functional chip cache. IBM Journal of Research and Development, 33(5):524–539.CrossRefGoogle Scholar
  20. 20.
    Matick, R.E. 1977. Computer Storage Systems and Technology. Wiley and Sons, New York.Google Scholar
  21. 21.
    Meade, R.M. 1971. Design approaches for cache memory control. Computer Design, January.Google Scholar
  22. 22.
    Morris, D. and R.N. Ibbett. 1979. The MU5 Computer System. Springer-Verlag, New York.Google Scholar
  23. 23.
    Pohm, A.V. and O.P. Agrawal. 1983. High-Speed memory Systems. Reston Publishers, Reston, Virginia, USA.Google Scholar
  24. 24.
    Przybylski, S.A. 1990. Cache Design: A Performance-Directed Approach. Mogran Kaufmann Publishers, San Mateo, California.zbMATHGoogle Scholar
  25. 25.
    Rau, R., D.W.L. Yau, W. Yen, and R.A. Towle. 1989. The Cydra 5 departmental supercomputer. IEEE Computer, 22 (1):12–35.CrossRefGoogle Scholar
  26. 26.
    Rau, B.R. 1991. Pseudo-randomly interleaved memory. In: Proceedings, 18th International Symposium on Computer Architecture, pp 74–83.Google Scholar
  27. 27.
    Smith, A.J. 1978. Sequential-program prefetching in memory hierarchies. IEEE Computer, December.Google Scholar
  28. 28.
    Smith, A.J. 1987. Line (block) size for CPU cache hierarchies. IEEE Transactions on Computers, C-36(9):1063–1075.CrossRefGoogle Scholar
  29. 29.
    Smith, A.J. 1982. Cache memories. ACM Computing Surveys, 14(3):473–530.CrossRefGoogle Scholar
  30. 30.
    Song, P. 1997. IBM’s Power3 to replace P2SC. Microprocessor Report, 11(5).Google Scholar
  31. 31.
    Tse, J. and A.J. Smith. 1998. CPU prefetching: timing evaluation of hardware implementations IEEE Transactions on Computers, 47(5):509–526.CrossRefGoogle Scholar
  32. 32.
    Thornton, J.E. 1970. Design of a Computer: The Control Data 6600. Scott, Foresman, and Co., Illinois, USA.Google Scholar
  33. 33.
    Tremblay, M., D. Greenley, and K. Normoyle. 1995. The design of the microarchitecture of the U1traSPARC-1. Proceedings of the IEEE, 83(12).Google Scholar

Copyright information

© Springer Science+Business Media New York 1999

Authors and Affiliations

  • Amos R. Omondi
    • 1
  1. 1.Department of Computer ScienceFlinders UniversityAdelaideAustralia

Personalised recommendations