Skip to main content

Prototyping a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability

  • Chapter
  • First Online:
Book cover Transactions on High-Performance Embedded Architectures and Compilers V

Abstract

We present the hardware design and implementation of a local memory system for individual processors inside future chip multiprocessors (CMP). Our memory system supports both implicit communication via caches, and explicit communication via directly accessible local (“scratchpad”) memories and remote DMA (RDMA). We provide run-time configurability of the SRAM blocks that lie near each processor, so that portions of them operate as 2nd level (local) cache, while the rest operate as scratchpad. We also strive to merge the communication subsystems required by the cache and scratchpad into one integrated Network Interface (NI) and Cache Controller (CC), in order to economize on circuits. The processor interacts with the NI at user-level through virtualized command areas in scratchpad; the NI uses a similar access mechanism to provide efficient support for two hardware synchronization primitives: counters, and queues. We describe the NI design, the hardware cost, and the latencies of our FPGA-based prototype implementation that integrates four MicroBlaze processors, each with 64 KBytes of local SRAM, a crossbar NoC, and a DRAM controller. One-way, end-to-end, user-level communication completes within about 20 clock cycles for short transfer sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Write-back policy can also be used, provided that coherence between L1 and L2 is maintained. However, the write-through policy simplifies coherence without any performance loss. The inclusion property assumed here, is more intuitive than exclusion that would require moving locked lines between the cache levels.

References

  1. Banakar, R., Steinke, S., Lee, B., Balakrishnan, M., Marwedel, P.: Scratchpad memory: a design alternative for cache on-chip memory in embedded systems. In: Proceedings of 10th International Symposium on HW/SW Codesign (CODES), Colorado (2002)

    Google Scholar 

  2. Bellens, P., Perez, J., Badia, R., Labarta, J.: CellSs: a programming model for the cell BE architecture. In: Proceedings of ACM/IEEE Conference on Supercomputing (SC), Tampa, Florida (2006)

    Google Scholar 

  3. Bhoedjang, R., Ruhl, T., Bal, H.: User-level network interface protocols. IEEE Comput. 31(11), 53–60 (1998)

    Article  Google Scholar 

  4. Brewer, E., Chong, F., Liu, L., Sharma, S., Kubiatowicz, J.: Remote queues: exposing message queues for optimization and atomicity. In: Proceedings of 7th ACM Symposium on Parallel Algorithms and Architectures (SPAA), St. Barbara (1995)

    Google Scholar 

  5. Byrd, G., Delagi, B.: Streamline: cache-based message passing in scalable multiprocessors. In: Proceedings of the International Conference on Parallel Processing (ICPP) (1991)

    Google Scholar 

  6. Byrd, G.T., Flynn, M.: Producer-consumer communication in distributed shared memory multiprocessors. Proc. IEEE 87(3), 456–466 (1999)

    Article  Google Scholar 

  7. Fatahalian, K., et al.: Sequoia: programming the memory hierarchy. In: Proceedings of ACM/IEEE Conference on Supercomputing (SC), Florida (2006)

    Google Scholar 

  8. Heinlein, J., Gharachorloo, K., Dresser, S., Gupta, A.: Integration of message passing and shared memory in the Stanford FLASH multiprocessor. ACM SIGOPS Oper. Syst. Rev. 28(5), 38–50 (1994)

    Article  Google Scholar 

  9. Kahle, J.A., Day, M.N., Hofstee, H.P., Johns, C.R., Maeurer, T.R., Shippy, D.: Introduction to the cell multiprocessor. IBM J. Res. Dev. 49(4/5), 589–604 (2005)

    Article  Google Scholar 

  10. Kapasi, U., et al.: Programmable stream processors. IEEE Comput. 36(8), 54–62 (2003). https://doi.org/10.1109/MC.2003.1220582

    Article  Google Scholar 

  11. Katevenis, M.: Interprocessor communication seen as load-store instruction generalization. In: The Future of Computing, Essays in Memory of Stamatis Vassiliadis, Delft, The Netherlands (2007)

    Google Scholar 

  12. Kavadias, S., Katevenis, M., Zampetakis, M., Nikolopoulos, D.: On-chip communication and synchronization with cache-integrated network interfaces. In: Proceedings of ACM International Conference on Computing Frontiers (CF 2010), Bertinoro, Italy (2010)

    Google Scholar 

  13. Kubiatowicz, J., Agarwal, A.: Anatomy of a message in the Alewife multiprocessor. In: Proceedings of the ACM International Conference on Supercomputing (ICS), Tokyo (1993)

    Google Scholar 

  14. Mai, K., Paaske, T., Jayasena, N., Ho, R., Dally, W., Horowitz, M.: Smart memories: a modular reconfigurable architecture. In: Proceedings of the 27th International Symposium on Computer Architecture (ISCA) (2000)

    Google Scholar 

  15. Markatos, E., Katevenis, M.: Telegraphos: high-performance networking for parallel processing on workstation clusters. In: Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture (HPCA), San Jose, CA USA (1996)

    Google Scholar 

  16. Mukherjee, S., Falsafi, B., Hill, M., Wood, D.: Coherent network interfaces for fine-grain communication. In: Proceedings of the 23rd International Symposium on Computer Architecture (ISCA) (1996)

    Google Scholar 

  17. Sankaralingam, K., et al.: Distributed microarchitectural protocols in the TRIPS prototype processor. In: Proceedings of the IEEE/ACM International Symposium on Microarchitecture (MICRO) (2006)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the European Commission in the context of the projects SARC (FP6 IP #27648) and UNiSIX (Marie-Curie #509595). We also thank, for their assistance in designing the architecture and developing the prototype: Dimitris Nikolopoulos, Alex Ramirez, Georgi Gaydadjiev, Spyros Lyberis, Christos Sotiriou, Dimitris Tsaliagos, and Michael Ligerakis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vassilis Papaefstathiou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer-Verlag GmbH Germany, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kalokerinos, G. et al. (2019). Prototyping a Configurable Cache/Scratchpad Memory with Virtualized User-Level RDMA Capability. In: Silvano, C., Bertels, K., Schulte, M. (eds) Transactions on High-Performance Embedded Architectures and Compilers V. Lecture Notes in Computer Science(), vol 11225. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-58834-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-58834-5_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-58833-8

  • Online ISBN: 978-3-662-58834-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics