Skip to main content

Scheduling to reduce memory coherence overhead on coarse-grain multiprocessors

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 949))

Abstract

Some Distributed Shared Memory (DSM) and Cache-Only Memory Architecture (COMA) multiprocessors keep processes near the data they reference by transparently replicating remote data in the processes' local memories. This automatic replication of data can impose substantial memory system overhead on an application since all replicated data must be kept coherent. We examine the effect of task scheduling on data replication and memory system overhead due to coherency requirements. We show that simple policies using programmer hints can reduce memory coherence overhead in our workload applications.

This research was supported in part by NSF grant CCR-9113170.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry M. Levy. Scheduler activations: Effective kernel support for the user-level management of parallelism. In Proceedings of the ACM Symposium on Operating Systems Principles, October 1991.

    Google Scholar 

  2. BBN. The Uniform System approach to programming the butterfly parallel processor. Technical Report Number 6149, Bolt Beranek and Newman Adv. Computers Inc., October 1985.

    Google Scholar 

  3. B. Bershad, E. Lazowska, and H. Levy. PRESTO: A system for object-oriented parallel programming. Software: Practice and Experience, 18(8):713–732, August 1988. TR 87-09-01.

    Google Scholar 

  4. W. Bolosky, M. Scott, and R. Fitzgerald. Simple but effective techniques for NUMA memory management. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 19–31, December 1989.

    Google Scholar 

  5. Rohit Chandra, Scott Devine, Ben Verghese, Anoop Gupta, and Mendel Rosenblum. Scheduling and page migration for multiprocessor compute servers. In Proceedings, Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12–24, 1994.

    Google Scholar 

  6. Rohit Chandra, Anoop Gupta, and John L. Hennessey. Integrating concurrency and data abstraction in the COOL programming language. IEEE Computer, 27(2), February 1994.

    Google Scholar 

  7. Christopher Connelly and Caria S. Ellis. Workload characterization and locality management for coarse grain multiprocessors. Technical Report CS-1994-30, Duke University, September 1994.

    Google Scholar 

  8. Christopher Connelly and Carla S. Ellis. A workload characterization for coarse grain multiprocessors. In International Parallel Processing Symposium, April 1995.

    Google Scholar 

  9. Helen Davis, Stephen R. Goldschmidt, and John Hennessy. Multiprocessor simulation and tracing using tango. In Proceedings of the 1991 International Conference on Parallel Processing, volume 2, pages 99–107, 1991.

    Google Scholar 

  10. K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessey. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15–26, May 1990.

    Google Scholar 

  11. Rick LaRowe and Carla Ellis. Experimental comparison of memory management policies for NUMA multiprocessors. ACM Transactions on Computer Systems, 9(4):319–363, November 1991.

    Google Scholar 

  12. R. P. LaRowe Jr., C. S. Ellis, and L. S. Kaplan. The robustness of NUMA memory management. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles, pages 137–151, October 1991.

    Google Scholar 

  13. D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148–159, May 1990.

    Google Scholar 

  14. E. Markatos and T. LeBlanc. Load balancing versus locality management in shared memory multiprocessors. Technical Report 399, University of Rochester, October 1991.

    Google Scholar 

  15. Evangelos Markatos. Scheduling for locality in shared-memory multiprocessors. Technical Report 457, University of Rochester, May 1993.

    Google Scholar 

  16. Brian D. Marsh, Michael L. Scott, Thomas J. LeBlanc, and Evangelos P. Markatos. First-class user-level threads. In Proceedings of 13th ACM Symposium on Operating Systems Principles, pages 110–21. Association for Computing Machinery SIGOPS, October 1991.

    Google Scholar 

  17. Jaswinder Pal Singh, Wolf-Dietrich Weber, and Anoop Gupta. SPLASH: Stanford parallel applications for shared-memory. Computer Architecture News, 20(1):1–44, 1992.

    Google Scholar 

  18. M. S. Squillante and E. D. Lazowska. Using processor-cache affinity information in shared memory multiprocessor scheduling. IEEE Transactions on Parallel and Distributed Systems., 4(2):131–143, February 1993.

    Google Scholar 

  19. Radhika Thekkath and Susan J. Eggers. Impact of sharing-based thread placement on multithreaded architectures. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 176–186, April 1994.

    Google Scholar 

  20. A. Tucker and A. Gupta. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 159–166, December 1989.

    Google Scholar 

  21. Raj Vaswani and John Zahorjan. The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors. In Proceedings of 13th ACM Symposium on Operating Systems Principles, pages 26–40. Association for Computing Machinery SIGOPS, October 1991.

    Google Scholar 

  22. Andrew Wilson, Marc Teller, Thomas Probert, Dyung Le, and Richard LaRowe. Lynx/Galactica Net: A distributed, cache coherent multiprocessing system. In Proceedings of the 25th Hawaii International Conference on System Sciences, volume 1, pages 416–426, 1992.

    Google Scholar 

  23. A. W. Wilson Jr. and R. P. LaRowe Jr.. Hiding shared memory reference latency in the Galactica Net distributed shared memory architecture. Journal of Parallel and Distributed Computing, 15(4):351–367, August 1992.

    Google Scholar 

  24. Andrew W. Wilson Jr., Richard P. LaRowe Jr., Robert J. Ionta, Ralph P. Valentineo, Beeching Hu, Peter R. Breton, and Pocheong Lau. Update propagation in the Galactica Net distributed shared memory architecture. Technical Report CHPC TR 93-007, Center for High Performance Computing, Worcester Polytechnic Institute, 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Dror G. Feitelson Larry Rudolph

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Connelly, C., Ellis, C.S. (1995). Scheduling to reduce memory coherence overhead on coarse-grain multiprocessors. In: Feitelson, D.G., Rudolph, L. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 1995. Lecture Notes in Computer Science, vol 949. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60153-8_23

Download citation

  • DOI: https://doi.org/10.1007/3-540-60153-8_23

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-60153-1

  • Online ISBN: 978-3-540-49459-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics