Implementing Application-Specific Cache-Coherence Protocols in Configurable Hardware

Brooks, David; Martonosi, Margaret

doi:10.1007/10704826_13

Implementing Application-Specific Cache-Coherence Protocols in Configurable Hardware

David Brooks⁶ &
Margaret Martonosi⁶

Conference paper

180 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1602))

Abstract

Streamlining communication is key to achieving good performance in shared-memory parallel programs. While full hardware support for cache coherence generally offers the best performance, not all parallel machines provide it. Instead, software layers using Shared Virtual Memory (SVM) can be built to enforce coherence at a higher level. In prior work, researchers have studied application-specific cache coherence protocols implemented either in SVM systems or as handlers run by programmable protocol processors. Since the protocols are specialized to the needs of a single application, they can be particularly helpful in reducing the long latencies and processing overhead that sometimes degrade performance in SVM systems. This paper studies implementing application-specific protocols in hardware, but not via an instruction-based protocol processor as is typical. Instead, we consider configurable implementations based on Field-Programmable Gate Arrays (FPGAs). This approach can be faster than software-based techniques and less expensive than some hardware-based techniques. We study one application, appbt, in detail, including a VHDL-level design of the configurable protocol design. We sketch out approaches for other applications as well. Implementing protocol operations in configurable hardware improves communication performance by roughly 11X for a 32-node system. While overall speedups are a more modest 12% our method is promising because of its flexibility and because it offers a new way of harnessing configurable hardware at the network interface, where it already exists or could be easily added to current systems.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Li, K., Hudak, P.: Memory Coherence in Shared Virtual Memory Systems. ACM Transactions on Computer Systems 7(4), 321–359 (1989)
Article Google Scholar
Reinhardt, S.K., Larus, J.R., Wood, D.A.: Tempest and Typhoon: User- Level Shared Memory. In: Proc. 21st Annual Int. Symposium on Computer Architecture (April 1994)
Google Scholar
Hill, M., et al.: Tempest: A Substrate for Portable Parallel Programs. In: COMP/CON Spring 95
Google Scholar
Falsafi, B., Lebeck, A.R., et al.: Application-Specific Protocols for User-Level Shared Memory. In: Supercomputing 1994 (November 1994)
Google Scholar
Boden, N.J., et al.: Myrinet – A Gigabit-per-Second Local-Area Network. IEEE Micro 15(1), 29–36 (1995)
Article Google Scholar
Bilas, A.: Improving the Performance of Shared Virtual Memory on System Area Networks. Technical Report #TR-586-98, Princeton Computer Science Dept. (August 1998)
Google Scholar
Liao, C., et al.: Monitoring Shared Virtual Memory on a Myrinet-based PC Cluster. In: 12^th ACM International Conference on Supercomputing (ICS) (July 1998)
Google Scholar
Pfile, R.W.: Typhoon-Zero Implementation: The Vortex Module. University of Wisconsin-Madison, August 31 (1995)
Google Scholar
Heinrich, M., et al.: The Performance Impact of Flexibility in the Stanford FLASH Multiprocessor. In: Proc. 6th Int. Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA (October 1994)
Google Scholar
McHenry, J.T., et al.: An FPGA-based coprocessor for ATM firewalls. In: Proc. 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (April 1997)
Google Scholar
Guillaud, J.-F., et al.: A PC/ATM interface accelerator using reconfigurable technology. In Proc. of the SPIE, vol. 2608, pp. 134–145 (1995)
Google Scholar
Chandra, et al.: Teapot: Language Support for Writing Memory Coherency Protocols. In: SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (May 1996)
Google Scholar
Veenstra, J.E., Fowler, R.J.: MINT Tutorial and User Manual. Technical Report 452, Computer Science Department, The University of Rochester (June 1993) (Revised August 1994)
Google Scholar
PCI Local Bus Specification, PCI Special Interest Group, Hillsboro, Oregon (April 1993)
Google Scholar
Techniques for Increasing PCI Performance, Intel Co. (September 1997)
Google Scholar
Fang, W., et al.: Contention and Queueing in an Experimental Multicomputer: Analytical and Simulation-based Results. TR-508-96, Princeton Computer Science Department (January 1996)
Google Scholar
Bailey, et al.: The NAS Parallel Benchmarks. TR RNR-91-002, Ames Research Center (January 1991)
Google Scholar
FPGA Express Version 2.0, Synopsys Co.
Google Scholar
Workview Office Version 7.3, Viewlogic Co.
Google Scholar
XACTstep Foundation Series F1.3 Software, Xilinx Co.
Google Scholar
Culler, D.E., et al.: Parallel Programming in Split-C. In: Supercomputing 1993 (November 1993)
Google Scholar
Chandra, et al.: Where is Time Spent in Message-Passing and Shared-Memory Programs? In: 6th Int. Conf. on Architectural Support for Prog. Languages and Operating Systems (October 1994)
Google Scholar
Mukherjee, S., et al.: Efficient Support for Irregular Applications on Distributed-Memory Machines. In: 5th Symposium on Principles and Practices of Parallel Programming (July 1995)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Electrical Engineering, Princeton University,
David Brooks & Margaret Martonosi

Authors

David Brooks
View author publications
You can also search for this author in PubMed Google Scholar
Margaret Martonosi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering,
Anand Sivasubramaniam
Department of Computer Science and Engineering, University of California, San Diego, 92093-0114, La Jolla, CA, USA
Mario Lauria

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Brooks, D., Martonosi, M. (1999). Implementing Application-Specific Cache-Coherence Protocols in Configurable Hardware. In: Sivasubramaniam, A., Lauria, M. (eds) Network-Based Parallel Computing. Communication, Architecture, and Applications. CANPC 1999. Lecture Notes in Computer Science, vol 1602. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10704826_13

Download citation

DOI: https://doi.org/10.1007/10704826_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65915-0
Online ISBN: 978-3-540-48869-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics