Skip to main content

Implementing Application-Specific Cache-Coherence Protocols in Configurable Hardware

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1602))

Abstract

Streamlining communication is key to achieving good performance in shared-memory parallel programs. While full hardware support for cache coherence generally offers the best performance, not all parallel machines provide it. Instead, software layers using Shared Virtual Memory (SVM) can be built to enforce coherence at a higher level. In prior work, researchers have studied application-specific cache coherence protocols implemented either in SVM systems or as handlers run by programmable protocol processors. Since the protocols are specialized to the needs of a single application, they can be particularly helpful in reducing the long latencies and processing overhead that sometimes degrade performance in SVM systems. This paper studies implementing application-specific protocols in hardware, but not via an instruction-based protocol processor as is typical. Instead, we consider configurable implementations based on Field-Programmable Gate Arrays (FPGAs). This approach can be faster than software-based techniques and less expensive than some hardware-based techniques. We study one application, appbt, in detail, including a VHDL-level design of the configurable protocol design. We sketch out approaches for other applications as well. Implementing protocol operations in configurable hardware improves communication performance by roughly 11X for a 32-node system. While overall speedups are a more modest 12% our method is promising because of its flexibility and because it offers a new way of harnessing configurable hardware at the network interface, where it already exists or could be easily added to current systems.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Li, K., Hudak, P.: Memory Coherence in Shared Virtual Memory Systems. ACM Transactions on Computer Systems 7(4), 321–359 (1989)

    Article  Google Scholar 

  2. Reinhardt, S.K., Larus, J.R., Wood, D.A.: Tempest and Typhoon: User- Level Shared Memory. In: Proc. 21st Annual Int. Symposium on Computer Architecture (April 1994)

    Google Scholar 

  3. Hill, M., et al.: Tempest: A Substrate for Portable Parallel Programs. In: COMP/CON Spring 95

    Google Scholar 

  4. Falsafi, B., Lebeck, A.R., et al.: Application-Specific Protocols for User-Level Shared Memory. In: Supercomputing 1994 (November 1994)

    Google Scholar 

  5. Boden, N.J., et al.: Myrinet – A Gigabit-per-Second Local-Area Network. IEEE Micro 15(1), 29–36 (1995)

    Article  Google Scholar 

  6. Bilas, A.: Improving the Performance of Shared Virtual Memory on System Area Networks. Technical Report #TR-586-98, Princeton Computer Science Dept. (August 1998)

    Google Scholar 

  7. Liao, C., et al.: Monitoring Shared Virtual Memory on a Myrinet-based PC Cluster. In: 12th ACM International Conference on Supercomputing (ICS) (July 1998)

    Google Scholar 

  8. Pfile, R.W.: Typhoon-Zero Implementation: The Vortex Module. University of Wisconsin-Madison, August 31 (1995)

    Google Scholar 

  9. Heinrich, M., et al.: The Performance Impact of Flexibility in the Stanford FLASH Multiprocessor. In: Proc. 6th Int. Conference on Architectural Support for Programming Languages and Operating Systems, San Jose, CA (October 1994)

    Google Scholar 

  10. McHenry, J.T., et al.: An FPGA-based coprocessor for ATM firewalls. In: Proc. 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (April 1997)

    Google Scholar 

  11. Guillaud, J.-F., et al.: A PC/ATM interface accelerator using reconfigurable technology. In Proc. of the SPIE, vol. 2608, pp. 134–145 (1995)

    Google Scholar 

  12. Chandra, et al.: Teapot: Language Support for Writing Memory Coherency Protocols. In: SIGPLAN Conference on Programming Language Design and Implementation (PLDI) (May 1996)

    Google Scholar 

  13. Veenstra, J.E., Fowler, R.J.: MINT Tutorial and User Manual. Technical Report 452, Computer Science Department, The University of Rochester (June 1993) (Revised August 1994)

    Google Scholar 

  14. PCI Local Bus Specification, PCI Special Interest Group, Hillsboro, Oregon (April 1993)

    Google Scholar 

  15. Techniques for Increasing PCI Performance, Intel Co. (September 1997)

    Google Scholar 

  16. Fang, W., et al.: Contention and Queueing in an Experimental Multicomputer: Analytical and Simulation-based Results. TR-508-96, Princeton Computer Science Department (January 1996)

    Google Scholar 

  17. Bailey, et al.: The NAS Parallel Benchmarks. TR RNR-91-002, Ames Research Center (January 1991)

    Google Scholar 

  18. FPGA Express Version 2.0, Synopsys Co.

    Google Scholar 

  19. Workview Office Version 7.3, Viewlogic Co.

    Google Scholar 

  20. XACTstep Foundation Series F1.3 Software, Xilinx Co.

    Google Scholar 

  21. Culler, D.E., et al.: Parallel Programming in Split-C. In: Supercomputing 1993 (November 1993)

    Google Scholar 

  22. Chandra, et al.: Where is Time Spent in Message-Passing and Shared-Memory Programs? In: 6th Int. Conf. on Architectural Support for Prog. Languages and Operating Systems (October 1994)

    Google Scholar 

  23. Mukherjee, S., et al.: Efficient Support for Irregular Applications on Distributed-Memory Machines. In: 5th Symposium on Principles and Practices of Parallel Programming (July 1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1999 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brooks, D., Martonosi, M. (1999). Implementing Application-Specific Cache-Coherence Protocols in Configurable Hardware. In: Sivasubramaniam, A., Lauria, M. (eds) Network-Based Parallel Computing. Communication, Architecture, and Applications. CANPC 1999. Lecture Notes in Computer Science, vol 1602. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10704826_13

Download citation

  • DOI: https://doi.org/10.1007/10704826_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65915-0

  • Online ISBN: 978-3-540-48869-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics