Scalability problems in multiprocessors with private caches

  • Michel Dubois
  • Luiz Barroso
  • Yung-Syau Chen
  • Koray Öner
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 605)


This paper addresses the performance problems caused by high memory access latencies and their effect on the scalability of shared-memory multiprocessor systems. We propose to take full advantage of weak ordering through lockup-free caches and delayed consistency protocols and to interconnect processors with point-to-point links. Simulation results show that these approaches are promising.


These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Selected References

  1. [1]
    Dubois, M., Scheurich, C., “Memory Access Dependencies in Shared Memory Multiprocessors,” IEEE Trans. on Soft. Eng., 16(6), pp. 660–674, June 1990.CrossRefGoogle Scholar
  2. [2]
    Scheurich, C. Access Ordering and Coherence in Shared-Memory Multiprocessors. PhD thesis, Univ. of Southern California, May 1989 (also U.S.C. Tech. Rep. CENG 89-19)Google Scholar
  3. [3]
    Scheurich, C. and Dubois, M., “Correct Memory Operation of Cache-based Multiprocessors,” Proc. of the 14th Int. Symp. on Comp. Arch, pp. 234–243, June 1987.Google Scholar
  4. [4]
    Lenoski, D., et al., “The Directory-based Cache Coherence Protocol for the DASH Multiprocessor,” Proc. of the 17th Ann. Int. Symp. on Comp. Arch., pp. 148–159, June 1990.Google Scholar
  5. [5]
    Gharachorloo, K., Gupta, A, and J. Hennessy, “Performance Evaluation of Memory Consistency Models for Shared-memory Multiprocessors,” ASPLOS IV, Apr 1991.Google Scholar
  6. [6]
    Kowalik, J. S. Parallel MIMD Computation: HEP Supercomputer and its Applications. The MIT Press, 1985.Google Scholar
  7. [7]
    Dubois, M., “A Cache-based Multiprocessor with High Efficiency,” IEEE Trans. on Comp., pp. 968–972, Oct. 1985.Google Scholar
  8. [8]
    Agarwal, A., et al., “APRIL: A Processor Architecture for Multiprocessing,” Proc. of the 17th Ann. Int. Symp. on Comp. Arch., pp. 104–114, June 1990.Google Scholar
  9. [9]
    Jouppi, N. J., and Wall, D., “Available Instruction-level Parallelism for Superscalar and Superpipelined Machines,” ASPLOS III, pp. 272–282, Apr 1989.Google Scholar
  10. [10]
    Amdahl, G. M., “Validity of the Single Processor Approach to Achieving Large-scale Computing Capabilities,” Proc. AFIPS, Vol. 30, pp.483–465, 1967.Google Scholar
  11. [11]
    Scheurich, C. and Dubois, M., “Lockup-free Caches in High-Performance Multiprocessors,” J. of Par. and Dist. Comp., Jan. 1991.Google Scholar
  12. [12]
    Kroft, D., “Lockup-free Instruction Fetch/Prefetch Cache Organization,” Proc. of the 8th Ann. Int. Symp. on Comp. Arch., pp. 81–87, June 1981.Google Scholar
  13. [13]
    Kogge, P. M. The Architecture of Pipelined Computers. Mc Graw-Hill, 1981.Google Scholar
  14. [14]
    McMahon, F. H. LLNL Fortran Kernels: MFlops. Technical Report, Lawrence Livermore Laboratories, Livermore, CA, March 1984.Google Scholar
  15. [15]
    Zima, H. and Chapman, B. Supercompilers for Parallel and Vector Computers. Addison-Wesley Publishing Company. 1990.Google Scholar
  16. [16]
    Porterfield, A. K. Software Methods for Improvement of Cache Performance on Supercomputer Applications. PhD dissertation, RICE COMP TR 89-93, May 1989.Google Scholar
  17. [17]
    Eggers, S. J., and Jeremiassen, T. E., “Eliminating False Sharing,” Proc. of the 1991 Int. Conf. on Par. Processing, pp. I-377–I-381, Aug. 1991.Google Scholar
  18. [18]
    Dubois, M., et al., “Delayed Consistency and its Effects on the Miss Rate of Parallel Programs,” Supercomputing' 91, pp. 197–206, Nov. 1991.Google Scholar
  19. [19]
    Censier, L. M. and Feautrier, P., “A New Solution to Coherence Problems in Multicache Systems,” IEEE Trans. on Computers, Vol. C-27, No. 12, pp. 1112–1118, Dec. 1978.Google Scholar
  20. [20]
    Gustavson, D. B., “The Scalable Coherent Interface and Related Standards Projects”, IEEE Micro, Vol. 12, No. 1, pp. 10–22, February 1992.CrossRefGoogle Scholar
  21. [21]
    Barroso, L. A. and Dubois, M., “Cache Coherence on a Slotted Ring”, Proceedings of the 1991 International Conference on Parallel Processing, pp. 1230–1237, August 1991.Google Scholar
  22. [22]
    Schwetman, H., “CSIM: A C-Based, Process-Oriented Simulation Language”, Proceedings of the 1986 Winter Simulation Conference, pp. 387–396, 1986.Google Scholar

Copyright information

© Springer-Verlag 1992

Authors and Affiliations

  • Michel Dubois
    • 1
  • Luiz Barroso
    • 1
  • Yung-Syau Chen
    • 1
  • Koray Öner
    • 1
  1. 1.Department of Electrical Engineering SystemsUniversity of Southern CaliforniaLos Angeles

Personalised recommendations