Scheduling to reduce memory coherence overhead on coarse-grain multiprocessors

Connelly, Christopher; Ellis, Caria Schlatter

doi:10.1007/3-540-60153-8_23

Scheduling to reduce memory coherence overhead on coarse-grain multiprocessors

Christopher Connelly¹ &
Caria Schlatter Ellis¹

Conference paper
First Online: 01 January 2005

236 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 949))

Abstract

Some Distributed Shared Memory (DSM) and Cache-Only Memory Architecture (COMA) multiprocessors keep processes near the data they reference by transparently replicating remote data in the processes' local memories. This automatic replication of data can impose substantial memory system overhead on an application since all replicated data must be kept coherent. We examine the effect of task scheduling on data replication and memory system overhead due to coherency requirements. We show that simple policies using programmer hints can reduce memory coherence overhead in our workload applications.

This research was supported in part by NSF grant CCR-9113170.

This is a preview of subscription content, log in via an institution.

Preview

Unable to display preview. Download preview PDF.

References

Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry M. Levy. Scheduler activations: Effective kernel support for the user-level management of parallelism. In Proceedings of the ACM Symposium on Operating Systems Principles, October 1991.
Google Scholar
BBN. The Uniform System approach to programming the butterfly parallel processor. Technical Report Number 6149, Bolt Beranek and Newman Adv. Computers Inc., October 1985.
Google Scholar
B. Bershad, E. Lazowska, and H. Levy. PRESTO: A system for object-oriented parallel programming. Software: Practice and Experience, 18(8):713–732, August 1988. TR 87-09-01.
Google Scholar
W. Bolosky, M. Scott, and R. Fitzgerald. Simple but effective techniques for NUMA memory management. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 19–31, December 1989.
Google Scholar
Rohit Chandra, Scott Devine, Ben Verghese, Anoop Gupta, and Mendel Rosenblum. Scheduling and page migration for multiprocessor compute servers. In Proceedings, Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12–24, 1994.
Google Scholar
Rohit Chandra, Anoop Gupta, and John L. Hennessey. Integrating concurrency and data abstraction in the COOL programming language. IEEE Computer, 27(2), February 1994.
Google Scholar
Christopher Connelly and Caria S. Ellis. Workload characterization and locality management for coarse grain multiprocessors. Technical Report CS-1994-30, Duke University, September 1994.
Google Scholar
Christopher Connelly and Carla S. Ellis. A workload characterization for coarse grain multiprocessors. In International Parallel Processing Symposium, April 1995.
Google Scholar
Helen Davis, Stephen R. Goldschmidt, and John Hennessy. Multiprocessor simulation and tracing using tango. In Proceedings of the 1991 International Conference on Parallel Processing, volume 2, pages 99–107, 1991.
Google Scholar
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessey. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15–26, May 1990.
Google Scholar
Rick LaRowe and Carla Ellis. Experimental comparison of memory management policies for NUMA multiprocessors. ACM Transactions on Computer Systems, 9(4):319–363, November 1991.
Google Scholar
R. P. LaRowe Jr., C. S. Ellis, and L. S. Kaplan. The robustness of NUMA memory management. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles, pages 137–151, October 1991.
Google Scholar
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148–159, May 1990.
Google Scholar
E. Markatos and T. LeBlanc. Load balancing versus locality management in shared memory multiprocessors. Technical Report 399, University of Rochester, October 1991.
Google Scholar
Evangelos Markatos. Scheduling for locality in shared-memory multiprocessors. Technical Report 457, University of Rochester, May 1993.
Google Scholar
Brian D. Marsh, Michael L. Scott, Thomas J. LeBlanc, and Evangelos P. Markatos. First-class user-level threads. In Proceedings of 13th ACM Symposium on Operating Systems Principles, pages 110–21. Association for Computing Machinery SIGOPS, October 1991.
Google Scholar
Jaswinder Pal Singh, Wolf-Dietrich Weber, and Anoop Gupta. SPLASH: Stanford parallel applications for shared-memory. Computer Architecture News, 20(1):1–44, 1992.
Google Scholar
M. S. Squillante and E. D. Lazowska. Using processor-cache affinity information in shared memory multiprocessor scheduling. IEEE Transactions on Parallel and Distributed Systems., 4(2):131–143, February 1993.
Google Scholar
Radhika Thekkath and Susan J. Eggers. Impact of sharing-based thread placement on multithreaded architectures. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 176–186, April 1994.
Google Scholar
A. Tucker and A. Gupta. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 159–166, December 1989.
Google Scholar
Raj Vaswani and John Zahorjan. The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors. In Proceedings of 13th ACM Symposium on Operating Systems Principles, pages 26–40. Association for Computing Machinery SIGOPS, October 1991.
Google Scholar
Andrew Wilson, Marc Teller, Thomas Probert, Dyung Le, and Richard LaRowe. Lynx/Galactica Net: A distributed, cache coherent multiprocessing system. In Proceedings of the 25th Hawaii International Conference on System Sciences, volume 1, pages 416–426, 1992.
Google Scholar
A. W. Wilson Jr. and R. P. LaRowe Jr.. Hiding shared memory reference latency in the Galactica Net distributed shared memory architecture. Journal of Parallel and Distributed Computing, 15(4):351–367, August 1992.
Google Scholar
Andrew W. Wilson Jr., Richard P. LaRowe Jr., Robert J. Ionta, Ralph P. Valentineo, Beeching Hu, Peter R. Breton, and Pocheong Lau. Update propagation in the Galactica Net distributed shared memory architecture. Technical Report CHPC TR 93-007, Center for High Performance Computing, Worcester Polytechnic Institute, 1993.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Duke University, 27708-0129, Durham, NC, USA
Christopher Connelly & Caria Schlatter Ellis

Authors

Christopher Connelly
View author publications
You can also search for this author in PubMed Google Scholar
Caria Schlatter Ellis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Dror G. Feitelson Larry Rudolph

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Connelly, C., Ellis, C.S. (1995). Scheduling to reduce memory coherence overhead on coarse-grain multiprocessors. In: Feitelson, D.G., Rudolph, L. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 1995. Lecture Notes in Computer Science, vol 949. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60153-8_23

Download citation

DOI: https://doi.org/10.1007/3-540-60153-8_23
Published: 02 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60153-1
Online ISBN: 978-3-540-49459-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics