An Efficient Lightweight Shared Cache Design for Chip Multiprocessors

Wang, Jinglei; Wang, Dongsheng; Xue, Yibo; Wang, Haixia

doi:10.1007/978-3-642-03644-6_3

Jinglei Wang¹⁹,
Dongsheng Wang¹⁹,
Yibo Xue¹⁹ &
…
Haixia Wang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5737))

Included in the following conference series:

International Workshop on Advanced Parallel Processing Technologies

738 Accesses
2 Citations

Abstract

The large working sets of commercial and scientific workloads favor a shared L2 cache design that maximizes the aggregate cache capacity and minimizes off-chip memory requests in Chip Multiprocessors (CMP). The exponential increase in the number of cores results in the commensurate increase in the memory cost of directory, restricting its scalability severely. To resolve this hurdle, a novel Lightweight Shared Cache design is proposed in this paper, which applies two small fast caches to store and manage the data and directory vectors for the blocks recently cached by L1 caches in each tile of CMP. The proposed cache scheme removes the directory vectors from L2 cache, thus decreases on-chip directory memory overhead and improves the scalability. Moreover, the proposed cache scheme brings significant reductions in terms of the L1 cache miss latencies, which lead to the improvement of program performance by 6% on average, and up to 16% at best, with 0.18% storage overhead.

This work has been supported by NSFC grants No. 60833004, No. 60773146 and No. 60673145.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Taylor, M.B., Kim, J., Miller, J., et al.: The raw microprocessor: A computational fabric for software circuits and general purpose programs. IEEE Micro. 22(2), 25–35 (2002)
Article Google Scholar
Zhang, M., Asanovic, K.: Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In: 32nd Int’l. Symp. on Computer Architecture (ISCA 2005), June 2005, pp. 336–345 (2005)
Google Scholar
Azimi, M., Cherukuri, N., Jayasimha, D.N., Kumar, A., Kundu, P., Park, S., Schoinas, I., Vaidya, A.S.: Integration challenges and tradeoffs for tera-scale architectures. Intel. Technology Journal 11(3), 173–184 (2007)
Article Google Scholar
Vangal, S., Howard, J., Ruhl, G., et al.: An 80-tile 1.28tflops network-on-chip in 65nm cmos. In: IEEE Int’l. Solid-State Circuits Conference (ISSCC) (February 2007)
Google Scholar
Chaiken, D., Fields, C., Kurihara, K., Agarwal, A.: Directory-based cache coherence in large-scale multiprocessors. Computer 23(6), 49–58 (1990)
Article Google Scholar
Gupta, A., Weber, W., Mowry, T.: Reducing Memory Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes. In: Int’l. Conference on Parallel Processing (ICPP 1990), August 1990, pp. 312–321 (1990)
Google Scholar
Nanda, A., Nguyen, A., Michael, M., Joseph, D.: High-Throughput Coherence Controllers. In: 6th Int’l. Symposium on High-Performance Computer Architecture (HPCA-6), January 2000, pp. 145–155 (2000)
Google Scholar
Michael, M., Nanda, A.: Design and Performance of Directory Caches for Scalable Shared Memory Multiprocessors. In: Fifth International Conference on High Performance Computer Architecture, HPCA-5 (1999)
Google Scholar
Iyer, R., Bhuyan, L.: Switch Cache: A Framework for Improving the Remote Memory Access Latency of CC-NUMA Multiprocessors. In: 5th Int’l. Symposium on High-Performance Computer Architecture (HPCA-5), January 1999, pp. 152–160 (1999)
Google Scholar
Acacio, M.E., Gonzalez, J., Garcia, J.M., Duato, J.: An architecture for highperformance scalable shared- memory multiprocessors exploiting on-chip integration. IEEE Transactions on Parallel and Distributed Systems 15(8), 755–768 (2004)
Article Google Scholar
Martin, M.M., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Computer Architecture News 33(4), 92–99 (2005)
Article Google Scholar
Ros, A., Acacio, M.E., Garca, J.M.: A Novel Lightweight Directory Archi-tecture for Scalable Shared-Memory Multiprocessors. In: Cunha, J.C., Medeiros, P.D. (eds.) Euro-Par 2005. LNCS, vol. 3648, pp. 582–591. Springer, Heidelberg (2005)
Chapter Google Scholar
Ros, A., Acacio, M.E., Garca, J.M.: An efficient cache design for scalable glueless shared-memory multiprocessors. In: Proceedings of the 3rd conference on Computing frontiers, pp. 321–330 (2006)
Google Scholar
Culler, D.E., Singh, J.P., Gupta, A.: Parallel Computer Architecture: A Hardware/Software Approach, 2nd edn. Harcourt Asia Pte Ltd. (2002)
Google Scholar
Woodacre, M., Robb, D., Roe, D., Feind, K.: The SGI AltixTM 3000 global shared-memory architecture.Technical Whitepaper, Silicon Graphics, Inc. (2003)
Google Scholar
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: Characterization and methodological considerations. In: 22nd Int’l. Symp. on Computer Architecture (ISCA 1995), June 1995, pp. 24–36 (1995)
Google Scholar
SPEC2000, http://www.spec.org
Barroso, L., et al.: Piranha: a scalable architecture based on single-chip multiprocessing. In: ISCA-27, Vancouver, BC, Canada (May 2000)
Google Scholar
Krewell, K.: Sun’s Niagara pours on the cores. Microprocessor Report 18(9), 11–13 (2004)
Google Scholar
Raza Microelectronics, Inc. XLR processor product overview (May 2005)
Google Scholar
Sinharoy, B., Kalla, R., Tendler, J., Eickemeyer, R., Joyner, J.: Power5 System Microarchitecture. IBM Journal of Research and Development 49(4) (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Jinglei Wang, Dongsheng Wang, Yibo Xue & Haixia Wang

Authors

Jinglei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dongsheng Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yibo Xue
View author publications
You can also search for this author in PubMed Google Scholar
Haixia Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National University of Defense Technology, Department of Computer Science, 410073, Changsha, P.R. China
Yong Dou
Lausanne (EPFL), Ecole Polytechnique Fédérale de ,Dépt. Physique, 1015, LAUSANNE, Switzerland
Ralf Gruber
Technik Rapperswil, HSR - Hochschule für, Oberseestr. 10, 8640, RAPPERSWIL , SCHWEIZ
Josef M. Joller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Wang, D., Xue, Y., Wang, H. (2009). An Efficient Lightweight Shared Cache Design for Chip Multiprocessors. In: Dou, Y., Gruber, R., Joller, J.M. (eds) Advanced Parallel Processing Technologies. APPT 2009. Lecture Notes in Computer Science, vol 5737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03644-6_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-03644-6_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03643-9
Online ISBN: 978-3-642-03644-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics