Abstract
Streamlining communication is key to achieving good performance in shared-memory parallel programs. While full hardware support for cache coherence generally offers the best performance, not all parallel machines provide it. Instead, software layers using Shared Virtual Memory (SVM) can be built to enforce coherence at a higher level. In prior work, researchers have studied application-specific cache coherence protocols implemented either in SVM systems or as handlers run by programmable protocol processors. Since the protocols are specialized to the needs of a single application, they can be particularly helpful in reducing the long latencies and processing overhead that sometimes degrade performance in SVM systems. This paper studies implementing application-specific protocols in hardware, but not via an instruction-based protocol processor as is typical. Instead, we consider configurable implementations based on Field-Programmable Gate Arrays (FPGAs). This approach can be faster than software-based techniques and less expensive than some hardware-based techniques. We study one application, appbt, in detail, including a VHDL-level design of the configurable protocol design. We sketch out approaches for other applications as well. Implementing protocol operations in configurable hardware improves communication performance by roughly 11X for a 32-node system. While overall speedups are a more modest 12% our method is promising because of its flexibility and because it offers a new way of harnessing configurable hardware at the network interface, where it already exists or could be easily added to current systems.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Li, K., Hudak, P.: Memory Coherence in Shared Virtual Memory Systems. ACM Transactions on Computer Systems 7(4), 321–359 (1989)
Reinhardt, S.K., Larus, J.R., Wood, D.A.: Tempest and Typhoon: User- Level Shared Memory. In: Proc. 21st Annual Int. Symposium on Computer Architecture (April 1994)
Hill, M., et al.: Tempest: A Substrate for Portable Parallel Programs. In: COMP/CON Spring 95
Falsafi, B., Lebeck, A.R., et al.: Application-Specific Protocols for User-Level Shared Memory. In: Supercomputing 1994 (November 1994)
Boden, N.J., et al.: Myrinet – A Gigabit-per-Second Local-Area Network. IEEE Micro 15(1), 29–36 (1995)
Bilas, A.: Improving the Performance of Shared Virtual Memory on System Area Networks. Technical Report #TR-586-98, Princeton Computer Science Dept. (August 1998)
Liao, C., et al.: Monitoring Shared Virtual Memory on a Myrinet-based PC Cluster. In: 12th ACM International Conference on Supercomputing (ICS) (July 1998)
Pfile, R.W.: Typhoon-Zero Implementation: The Vortex Module. University of Wisconsin-Madison, August 31 (1995)
Heinrich, M., et al.: The Performance Impact of Flexibility in the Stanford FLASH Multiprocessor. In: Proc. 6th Int. Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA (October 1994)
McHenry, J.T., et al.: An FPGA-based coprocessor for ATM firewalls. In: Proc. 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (April 1997)
Guillaud, J.-F., et al.: A PC/ATM interface accelerator using reconfigurable technology. In Proc. of the SPIE, vol. 2608, pp. 134–145 (1995)
Chandra, et al.: Teapot: Language Support for Writing Memory Coherency Protocols. In: SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (May 1996)
Veenstra, J.E., Fowler, R.J.: MINT Tutorial and User Manual. Technical Report 452, Computer Science Department, The University of Rochester (June 1993) (Revised August 1994)
PCI Local Bus Specification, PCI Special Interest Group, Hillsboro, Oregon (April 1993)
Techniques for Increasing PCI Performance, Intel Co. (September 1997)
Fang, W., et al.: Contention and Queueing in an Experimental Multicomputer: Analytical and Simulation-based Results. TR-508-96, Princeton Computer Science Department (January 1996)
Bailey, et al.: The NAS Parallel Benchmarks. TR RNR-91-002, Ames Research Center (January 1991)
FPGA Express Version 2.0, Synopsys Co.
Workview Office Version 7.3, Viewlogic Co.
XACTstep Foundation Series F1.3 Software, Xilinx Co.
Culler, D.E., et al.: Parallel Programming in Split-C. In: Supercomputing 1993 (November 1993)
Chandra, et al.: Where is Time Spent in Message-Passing and Shared-Memory Programs? In: 6th Int. Conf. on Architectural Support for Prog. Languages and Operating Systems (October 1994)
Mukherjee, S., et al.: Efficient Support for Irregular Applications on Distributed-Memory Machines. In: 5th Symposium on Principles and Practices of Parallel Programming (July 1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brooks, D., Martonosi, M. (1999). Implementing Application-Specific Cache-Coherence Protocols in Configurable Hardware. In: Sivasubramaniam, A., Lauria, M. (eds) Network-Based Parallel Computing. Communication, Architecture, and Applications. CANPC 1999. Lecture Notes in Computer Science, vol 1602. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10704826_13
Download citation
DOI: https://doi.org/10.1007/10704826_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65915-0
Online ISBN: 978-3-540-48869-9
eBook Packages: Springer Book Archive