Hierarchical Cache Directory for CMP

Guo, Song-Liu; Wang, Hai-Xia; Xue, Yi-Bo; Li, Chong-Min; Wang, Dong-Sheng

doi:10.1007/s11390-010-9321-5

Hierarchical Cache Directory for CMP

Regular Paper
Published: 16 March 2010

Volume 25, pages 246–256, (2010)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Song-Liu Guo¹,
Hai-Xia Wang²,
Yi-Bo Xue²,
Chong-Min Li¹ &
…
Dong-Sheng Wang^1,2

195 Accesses
26 Citations
3 Altmetric
Explore all metrics

Abstract

As more processing cores are integrated into one chip and feature size continues to shrink, the average access latency for remote nodes using directory-based coherence protocol becomes higher, which greatly impacts system performance. Previous techniques such as data replication and data migration optimize the performance of the requesting core, but offer little improvement for neighbor nodes. Other techniques such as in-transit optimization try to reduce latency at the cost of increased storage. This paper introduces hierarchical cache directory into CMP (chip multiprocessor), which divides CMP tiles into multiple regions hierarchically, and combines it with data replication. A new directory organization is proposed to record the share status within a region and assist the regional home to complete operation efficiently. Simulation results show that for a 16-core CMP, compared to traditional directory, hierarchical cache directory reduces average access latency by 9% and on-chip network traffic by 34% on average with less storage. Theoretical analyses show that for a 2ⁿ × 2ⁿ tiled CMP, the average access latency in hierarchical cache directory asymptotically approaches a function that is independent of n, hence the architecture is highly scalable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

PS directory: a scalable multilevel directory cache for CMPs

Article 12 November 2014

Characterization of a List-Based Directory Cache Coherence Protocol for Manycore CMPs

Exploring grouped coherence for clustered hierarchical cache

Article 28 March 2017

References

Kim C, Burger D, Keckler S W. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ACM SIGPLAN Not., 2002, 37(10): 211–222.
Article Google Scholar
Chishti Z, Powell M D, Vijaykumar T N. Optimizing replication, communication, and capacity allocation in CMPs. In Proc. the 32nd Annual International Symposium on Computer Architecture, Madison, USA, June 4–8, 2005, pp.357–368.
Zhang M, Asanovic K. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In Proc. the 32nd Annual International Symposium on Computer Architecture (ISCA 2005), June 4–8, 2005, pp.336–345.
Chang J, Sohi G S. Cooperative caching for chip multiprocessors. In Proc. the 33rd Annual International Symposium on Computer Architecture (ISCA 2006), Boston, USA, June 17–21, 2006, pp.264–276.
Eisley N, Peh L S, Shang L. In-network cache coherence. In Proc. the 39th International Symposium on Microarchitecture (MICRO 2006), Orlando, USA, Dec. 9–13, 2006, pp.321–332.
Enright-Jerger N, Peh L S, Lipasti M. Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence. In Proc. 41st International Symposium on Microarchitecture (MICRO 2008), Lake Como, Italy, Nov. 8–12, 2008, pp.35–46.
Wallach D A. PHD: A hierarchical cache coherent protocol [Master's Thesis]. MIT, September 1992.
Gustavson D. The scalable coherent interface and related standards projects. IEEE Micro, Jan./Feb. 1992, 12(1): 10–22.
Article Google Scholar
Nilsson H, Stenström P. The scalable tree protocol — A cache coherence approach for large-scale multiprocessors. In Proc. SPDP 1992, Arlington, USA, Dec. 1–4, 1992, pp.498–506.
Acacio M E, Gonzalez J, Garcia J M et al. A two-level directory architecture for highly scalable cc-NUMA multiprocessors. IEEE Transactions on Parallel and Distributed, Jan. 2005, 16(1): 67–79.
Article Google Scholar
Acacio M E, Gonzalez J, Garcia J M et al. A new scalable directory architecture for large-scale multiprocessors. In Proc. HPCA-7, Nuevo Leone, Mexico, Jan. 20–24, 2001, pp.97–106.
Acacio M E, Gonzalez J, Garcia J M, Duato J. An architecture for high-performance scalable shared-memory multiprocessors exploiting on-chip integration. IEEE Transactions on Parallel and Distributed Systems, August 2004, 15(8): 755–768.
Article Google Scholar
Wilson A W. Hierarchical cache/bus architecture for shared memory multiprocessors. In Proc. the 14th Annual International Symposium on Computer Architecture, Pittsburgh, USA, June 2–5, 1987, pp.244–252.
Zhang Y, Lu Z, Jantsch A, Li L, Gao M. Towards hierarchical cluster based cache coherence for large-scale network-on-chip. In Proc. the 4th IEEE International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS ’09), Cairo, Egypt, April 6–7, 2009, pp.119–122.
Huh J et al. A NUCA substrate for flexible CMP cache sharing. In Proc. the 19th Annual International Conference on Supercomputing, Massachusetts, USA, June 20–22, 2005, pp.31–40.
Hardavellas N, Ferdman M, Falsafi B, Ailamaki A. R-NUCA: Data placement in distributed shared caches. In Proc. the 36th Annual International Symposium on Computer Architecture, Texas, USA, June 20–24, 2009.
Herrero E, Gonzáez J, Canal R. Distributed cooperative caching. In Proc. the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT 2008), Toronto, Canada, Oct. 25–29, 2008, pp.134–143.
Eisley N, Peh L S, Shang L. Leveraging on-chip networks for data cache migration in chip multiprocessors. In Proc. the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT 2008), Toronto, Canada, Oct. 25–29, 2008, pp.197–207.
Beckmann B, Marty M, Wood D. ASR: Adaptive selective replication for CMP caches. In Proc. the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Orlando, USA, Dec. 9–13, 2006, pp.321–332.
https://www.simics.net/.
Martin M M K, Sorin D J, Beckmann B M, Marty M R, Xu M, Alameldeen A R, Moore K E, Hill M D, Wood D A. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Computer Architecture News (CAN), September 2005, 33(4): 92–99.
Article Google Scholar
Woo S C, Ohara M, Torrie E, Singh J P, Gupta A. The SPLASH-2 programs: Characterization and methodological considerations. In Proc. the 22nd Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 22–24, 1995, pp.24–37.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Beijing, 100084, China
Song-Liu Guo, Chong-Min Li (Student Member, CCF,) & Dong-Sheng Wang (Senior Member, CCF,)
Tsinghua National Laboratory of Information Science and Technology, Beijing, 100084, China
Hai-Xia Wang (Senior Member, CCF), Yi-Bo Xue (Senior Member, CCF,) & Dong-Sheng Wang (Senior Member, CCF,)

Authors

Song-Liu Guo
View author publications
You can also search for this author in PubMed Google Scholar
Hai-Xia Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yi-Bo Xue
View author publications
You can also search for this author in PubMed Google Scholar
Chong-Min Li
View author publications
You can also search for this author in PubMed Google Scholar
Dong-Sheng Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hai-Xia Wang.

Additional information

This work is supported by the National Natural Science Foundation of China under Grant Nos. 60673145, 60773146 and 60833004.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, SL., Wang, HX., Xue, YB. et al. Hierarchical Cache Directory for CMP. J. Comput. Sci. Technol. 25, 246–256 (2010). https://doi.org/10.1007/s11390-010-9321-5

Download citation

Received: 10 June 2009
Revised: 27 October 2009
Published: 16 March 2010
Issue Date: March 2010
DOI: https://doi.org/10.1007/s11390-010-9321-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hierarchical Cache Directory for CMP

Abstract

Access this article

Similar content being viewed by others

PS directory: a scalable multilevel directory cache for CMPs

Characterization of a List-Based Directory Cache Coherence Protocol for Manycore CMPs

Exploring grouped coherence for clustered hierarchical cache

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hierarchical Cache Directory for CMP

Abstract

Access this article

Similar content being viewed by others

PS directory: a scalable multilevel directory cache for CMPs

Characterization of a List-Based Directory Cache Coherence Protocol for Manycore CMPs

Exploring grouped coherence for clustered hierarchical cache

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation