Abstract
Producer-initiated mechanisms are added to invalidate- based systems to reduce communication latencies by transferring data as soon as it is produced. This paper compares the performance of three producer-initiated mechanisms: lock, deliver, and StreamLine. All three approaches out-perform invalidate with prefetch in most cases.
Cached-based locks offer 10–20% speedup over prefetch for two of the three benchmarks studies. StreamLine performs well in low-bandwidth environments, but does not improve with increased bandwidth. Deliver is generally competitive with prefetch, but does not offer a significant performance advantage overall.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hazim Abdel-Shafi et al. An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors. In 3rd Intl. Symp. on High-Perf. Comp. Architecture, pages 204–215, February 1997.
Nagi Aboulenein et al. Hardware support for synchronization in the Scalable Coherent Interface (SCI). In 8th Intl. Parallel Processing Symp., pages 141–150, April 1994.
Gregory T. Byrd and Bruce A. Delagi. StreamLine: cache-based message passing in scalable multiprocessors. In 1991 Intl. Conf. on Parallel Processing, volume I, pages 251–254, August 1991.
Bruce A. Delagi et al. Instrumented architectural simulation. In 1988 Intl. Conf. on Supercomputing, volume 1, pages 8–11, May 1988.
James R. Goodman et al. Efficient synchronization primitives for large-scale cache-coherent multiprocessors. In 3rd Intl. Conf. on Arch. Support for Prog. Languages and Operating Systems (ASPLOS-III), pages 64–75, April 1989.
Alain Kägi, Doug Burger, and James R. Goodman. Efficient synchronization: Let them eat QOLB. In 24th Intl. Symp. on Comp. Architecture, June 1997.
John Kubiatowicz and Anant Agarwal. Anatomy of a message in the Alewife multiprocessor. In 7th ACM Intl. Conf. on Supercomputing, July 1993.
Daniel Lenoski et al. The Stanford DASH multiprocessor. Computer, 25(3):63–79, March 1992.
Todd Mowry and Anoop Gupta. Tolerating latency through software-controlled prefetching in shared-memory multiprocessors. Journal of Parallel and Distr. Computing, 12(2):87–106, June 1991.
David K. Poulsen and Pen-Chung Yew. Integrating fine-grained message passing in cache coherent shared memory multiprocessors. Journal of Parallel and Distr. Computing, 33(2):172–188, March 1996.
Umakishore Ramachandran et al. Architectural mechanisms for explicit communication in shared memory multiprocessors. In Supercomputing’ 95, December 1995.
Umakishore Ramachandran and Joonwon Lee. Cache-based synchronization in shared memory multiprocessors. Journal of Parallel and Distr. Computing, 32(1):11–27, January 1996.
C. Scheurich and M. Dubois. Concurrent miss resolution in multiprocessor caches. In 1988 Intl. Conf. on Parallel Processing, volume 1, pages 118–125, August 1988.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Byrd, G.T., Flynn, M.J. (1998). Evaluation of Communication Mechanisms in Invalidate-based Shared Memory Multiprocessors. In: Yalamanchili, S., Duato, J. (eds) Parallel Computer Routing and Communication. PCRCW 1997. Lecture Notes in Computer Science, vol 1417. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-69352-1_14
Download citation
DOI: https://doi.org/10.1007/3-540-69352-1_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-64571-9
Online ISBN: 978-3-540-69352-9
eBook Packages: Springer Book Archive