Skip to main content

An Efficient Lightweight Shared Cache Design for Chip Multiprocessors

  • Conference paper
Advanced Parallel Processing Technologies (APPT 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5737))

Included in the following conference series:

Abstract

The large working sets of commercial and scientific workloads favor a shared L2 cache design that maximizes the aggregate cache capacity and minimizes off-chip memory requests in Chip Multiprocessors (CMP). The exponential increase in the number of cores results in the commensurate increase in the memory cost of directory, restricting its scalability severely. To resolve this hurdle, a novel Lightweight Shared Cache design is proposed in this paper, which applies two small fast caches to store and manage the data and directory vectors for the blocks recently cached by L1 caches in each tile of CMP. The proposed cache scheme removes the directory vectors from L2 cache, thus decreases on-chip directory memory overhead and improves the scalability. Moreover, the proposed cache scheme brings significant reductions in terms of the L1 cache miss latencies, which lead to the improvement of program performance by 6% on average, and up to 16% at best, with 0.18% storage overhead.

This work has been supported by NSFC grants No. 60833004, No. 60773146 and No. 60673145.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Taylor, M.B., Kim, J., Miller, J., et al.: The raw microprocessor: A computational fabric for software circuits and general purpose programs. IEEE Micro. 22(2), 25–35 (2002)

    Article  Google Scholar 

  2. Zhang, M., Asanovic, K.: Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In: 32nd Int’l. Symp. on Computer Architecture (ISCA 2005), June 2005, pp. 336–345 (2005)

    Google Scholar 

  3. Azimi, M., Cherukuri, N., Jayasimha, D.N., Kumar, A., Kundu, P., Park, S., Schoinas, I., Vaidya, A.S.: Integration challenges and tradeoffs for tera-scale architectures. Intel. Technology Journal 11(3), 173–184 (2007)

    Article  Google Scholar 

  4. Vangal, S., Howard, J., Ruhl, G., et al.: An 80-tile 1.28tflops network-on-chip in 65nm cmos. In: IEEE Int’l. Solid-State Circuits Conference (ISSCC) (February 2007)

    Google Scholar 

  5. Chaiken, D., Fields, C., Kurihara, K., Agarwal, A.: Directory-based cache coherence in large-scale multiprocessors. Computer 23(6), 49–58 (1990)

    Article  Google Scholar 

  6. Gupta, A., Weber, W., Mowry, T.: Reducing Memory Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes. In: Int’l. Conference on Parallel Processing (ICPP 1990), August 1990, pp. 312–321 (1990)

    Google Scholar 

  7. Nanda, A., Nguyen, A., Michael, M., Joseph, D.: High-Throughput Coherence Controllers. In: 6th Int’l. Symposium on High-Performance Computer Architecture (HPCA-6), January 2000, pp. 145–155 (2000)

    Google Scholar 

  8. Michael, M., Nanda, A.: Design and Performance of Directory Caches for Scalable Shared Memory Multiprocessors. In: Fifth International Conference on High Performance Computer Architecture, HPCA-5 (1999)

    Google Scholar 

  9. Iyer, R., Bhuyan, L.: Switch Cache: A Framework for Improving the Remote Memory Access Latency of CC-NUMA Multiprocessors. In: 5th Int’l. Symposium on High-Performance Computer Architecture (HPCA-5), January 1999, pp. 152–160 (1999)

    Google Scholar 

  10. Acacio, M.E., Gonzalez, J., Garcia, J.M., Duato, J.: An architecture for highperformance scalable shared- memory multiprocessors exploiting on-chip integration. IEEE Transactions on Parallel and Distributed Systems 15(8), 755–768 (2004)

    Article  Google Scholar 

  11. Martin, M.M., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Computer Architecture News 33(4), 92–99 (2005)

    Article  Google Scholar 

  12. Ros, A., Acacio, M.E., Garca, J.M.: A Novel Lightweight Directory Archi-tecture for Scalable Shared-Memory Multiprocessors. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 582–591. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. Ros, A., Acacio, M.E., Garca, J.M.: An efficient cache design for scalable glueless shared-memory multiprocessors. In: Proceedings of the 3rd conference on Computing frontiers, pp. 321–330 (2006)

    Google Scholar 

  14. Culler, D.E., Singh, J.P., Gupta, A.: Parallel Computer Architecture: A Hardware/Software Approach, 2nd edn. Harcourt Asia Pte Ltd. (2002)

    Google Scholar 

  15. Woodacre, M., Robb, D., Roe, D., Feind, K.: The SGI AltixTM 3000 global shared-memory architecture.Technical Whitepaper, Silicon Graphics, Inc. (2003)

    Google Scholar 

  16. Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: Characterization and methodological considerations. In: 22nd Int’l. Symp. on Computer Architecture (ISCA 1995), June 1995, pp. 24–36 (1995)

    Google Scholar 

  17. SPEC2000, http://www.spec.org

  18. Barroso, L., et al.: Piranha: a scalable architecture based on single-chip multiprocessing. In: ISCA-27, Vancouver, BC, Canada (May 2000)

    Google Scholar 

  19. Krewell, K.: Sun’s Niagara pours on the cores. Microprocessor Report 18(9), 11–13 (2004)

    Google Scholar 

  20. Raza Microelectronics, Inc. XLR processor product overview (May 2005)

    Google Scholar 

  21. Sinharoy, B., Kalla, R., Tendler, J., Eickemeyer, R., Joyner, J.: Power5 System Microarchitecture. IBM Journal of Research and Development 49(4) (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, J., Wang, D., Xue, Y., Wang, H. (2009). An Efficient Lightweight Shared Cache Design for Chip Multiprocessors. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03644-6_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03643-9

  • Online ISBN: 978-3-642-03644-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics