Abstract
Some Distributed Shared Memory (DSM) and Cache-Only Memory Architecture (COMA) multiprocessors keep processes near the data they reference by transparently replicating remote data in the processes' local memories. This automatic replication of data can impose substantial memory system overhead on an application since all replicated data must be kept coherent. We examine the effect of task scheduling on data replication and memory system overhead due to coherency requirements. We show that simple policies using programmer hints can reduce memory coherence overhead in our workload applications.
This research was supported in part by NSF grant CCR-9113170.
This is a preview of subscription content, log in via an institution.
Preview
Unable to display preview. Download preview PDF.
References
Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska, and Henry M. Levy. Scheduler activations: Effective kernel support for the user-level management of parallelism. In Proceedings of the ACM Symposium on Operating Systems Principles, October 1991.
BBN. The Uniform System approach to programming the butterfly parallel processor. Technical Report Number 6149, Bolt Beranek and Newman Adv. Computers Inc., October 1985.
B. Bershad, E. Lazowska, and H. Levy. PRESTO: A system for object-oriented parallel programming. Software: Practice and Experience, 18(8):713–732, August 1988. TR 87-09-01.
W. Bolosky, M. Scott, and R. Fitzgerald. Simple but effective techniques for NUMA memory management. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 19–31, December 1989.
Rohit Chandra, Scott Devine, Ben Verghese, Anoop Gupta, and Mendel Rosenblum. Scheduling and page migration for multiprocessor compute servers. In Proceedings, Sixth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 12–24, 1994.
Rohit Chandra, Anoop Gupta, and John L. Hennessey. Integrating concurrency and data abstraction in the COOL programming language. IEEE Computer, 27(2), February 1994.
Christopher Connelly and Caria S. Ellis. Workload characterization and locality management for coarse grain multiprocessors. Technical Report CS-1994-30, Duke University, September 1994.
Christopher Connelly and Carla S. Ellis. A workload characterization for coarse grain multiprocessors. In International Parallel Processing Symposium, April 1995.
Helen Davis, Stephen R. Goldschmidt, and John Hennessy. Multiprocessor simulation and tracing using tango. In Proceedings of the 1991 International Conference on Parallel Processing, volume 2, pages 99–107, 1991.
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessey. Memory consistency and event ordering in scalable shared-memory multiprocessors. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 15–26, May 1990.
Rick LaRowe and Carla Ellis. Experimental comparison of memory management policies for NUMA multiprocessors. ACM Transactions on Computer Systems, 9(4):319–363, November 1991.
R. P. LaRowe Jr., C. S. Ellis, and L. S. Kaplan. The robustness of NUMA memory management. In Proceedings of the Thirteenth ACM Symposium on Operating Systems Principles, pages 137–151, October 1991.
D. Lenoski, J. Laudon, K. Gharachorloo, A. Gupta, and J. Hennessy. The directory-based cache coherence protocol for the DASH multiprocessor. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 148–159, May 1990.
E. Markatos and T. LeBlanc. Load balancing versus locality management in shared memory multiprocessors. Technical Report 399, University of Rochester, October 1991.
Evangelos Markatos. Scheduling for locality in shared-memory multiprocessors. Technical Report 457, University of Rochester, May 1993.
Brian D. Marsh, Michael L. Scott, Thomas J. LeBlanc, and Evangelos P. Markatos. First-class user-level threads. In Proceedings of 13th ACM Symposium on Operating Systems Principles, pages 110–21. Association for Computing Machinery SIGOPS, October 1991.
Jaswinder Pal Singh, Wolf-Dietrich Weber, and Anoop Gupta. SPLASH: Stanford parallel applications for shared-memory. Computer Architecture News, 20(1):1–44, 1992.
M. S. Squillante and E. D. Lazowska. Using processor-cache affinity information in shared memory multiprocessor scheduling. IEEE Transactions on Parallel and Distributed Systems., 4(2):131–143, February 1993.
Radhika Thekkath and Susan J. Eggers. Impact of sharing-based thread placement on multithreaded architectures. In Proceedings of the 21st Annual International Symposium on Computer Architecture, pages 176–186, April 1994.
A. Tucker and A. Gupta. Process control and scheduling issues for multiprogrammed shared-memory multiprocessors. In Proceedings of the ACM Symposium on Operating Systems Principles, pages 159–166, December 1989.
Raj Vaswani and John Zahorjan. The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors. In Proceedings of 13th ACM Symposium on Operating Systems Principles, pages 26–40. Association for Computing Machinery SIGOPS, October 1991.
Andrew Wilson, Marc Teller, Thomas Probert, Dyung Le, and Richard LaRowe. Lynx/Galactica Net: A distributed, cache coherent multiprocessing system. In Proceedings of the 25th Hawaii International Conference on System Sciences, volume 1, pages 416–426, 1992.
A. W. Wilson Jr. and R. P. LaRowe Jr.. Hiding shared memory reference latency in the Galactica Net distributed shared memory architecture. Journal of Parallel and Distributed Computing, 15(4):351–367, August 1992.
Andrew W. Wilson Jr., Richard P. LaRowe Jr., Robert J. Ionta, Ralph P. Valentineo, Beeching Hu, Peter R. Breton, and Pocheong Lau. Update propagation in the Galactica Net distributed shared memory architecture. Technical Report CHPC TR 93-007, Center for High Performance Computing, Worcester Polytechnic Institute, 1993.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1995 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Connelly, C., Ellis, C.S. (1995). Scheduling to reduce memory coherence overhead on coarse-grain multiprocessors. In: Feitelson, D.G., Rudolph, L. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 1995. Lecture Notes in Computer Science, vol 949. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60153-8_23
Download citation
DOI: https://doi.org/10.1007/3-540-60153-8_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60153-1
Online ISBN: 978-3-540-49459-1
eBook Packages: Springer Book Archive