Skip to main content
Log in

Hierarchical Cache Directory for CMP

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

As more processing cores are integrated into one chip and feature size continues to shrink, the average access latency for remote nodes using directory-based coherence protocol becomes higher, which greatly impacts system performance. Previous techniques such as data replication and data migration optimize the performance of the requesting core, but offer little improvement for neighbor nodes. Other techniques such as in-transit optimization try to reduce latency at the cost of increased storage. This paper introduces hierarchical cache directory into CMP (chip multiprocessor), which divides CMP tiles into multiple regions hierarchically, and combines it with data replication. A new directory organization is proposed to record the share status within a region and assist the regional home to complete operation efficiently. Simulation results show that for a 16-core CMP, compared to traditional directory, hierarchical cache directory reduces average access latency by 9% and on-chip network traffic by 34% on average with less storage. Theoretical analyses show that for a 2n × 2n tiled CMP, the average access latency in hierarchical cache directory asymptotically approaches a function that is independent of n, hence the architecture is highly scalable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Kim C, Burger D, Keckler S W. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ACM SIGPLAN Not., 2002, 37(10): 211–222.

    Article  Google Scholar 

  2. Chishti Z, Powell M D, Vijaykumar T N. Optimizing replication, communication, and capacity allocation in CMPs. In Proc. the 32nd Annual International Symposium on Computer Architecture, Madison, USA, June 4–8, 2005, pp.357–368.

  3. Zhang M, Asanovic K. Victim replication: Maximizing capacity while hiding wire delay in tiled chip multiprocessors. In Proc. the 32nd Annual International Symposium on Computer Architecture (ISCA 2005), June 4–8, 2005, pp.336–345.

  4. Chang J, Sohi G S. Cooperative caching for chip multiprocessors. In Proc. the 33rd Annual International Symposium on Computer Architecture (ISCA 2006), Boston, USA, June 17–21, 2006, pp.264–276.

  5. Eisley N, Peh L S, Shang L. In-network cache coherence. In Proc. the 39th International Symposium on Microarchitecture (MICRO 2006), Orlando, USA, Dec. 9–13, 2006, pp.321–332.

  6. Enright-Jerger N, Peh L S, Lipasti M. Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence. In Proc. 41st International Symposium on Microarchitecture (MICRO 2008), Lake Como, Italy, Nov. 8–12, 2008, pp.35–46.

  7. Wallach D A. PHD: A hierarchical cache coherent protocol [Master's Thesis]. MIT, September 1992.

  8. Gustavson D. The scalable coherent interface and related standards projects. IEEE Micro, Jan./Feb. 1992, 12(1): 10–22.

    Article  Google Scholar 

  9. Nilsson H, Stenström P. The scalable tree protocol — A cache coherence approach for large-scale multiprocessors. In Proc. SPDP 1992, Arlington, USA, Dec. 1–4, 1992, pp.498–506.

  10. Acacio M E, Gonzalez J, Garcia J M et al. A two-level directory architecture for highly scalable cc-NUMA multiprocessors. IEEE Transactions on Parallel and Distributed, Jan. 2005, 16(1): 67–79.

    Article  Google Scholar 

  11. Acacio M E, Gonzalez J, Garcia J M et al. A new scalable directory architecture for large-scale multiprocessors. In Proc. HPCA-7, Nuevo Leone, Mexico, Jan. 20–24, 2001, pp.97–106.

  12. Acacio M E, Gonzalez J, Garcia J M, Duato J. An architecture for high-performance scalable shared-memory multiprocessors exploiting on-chip integration. IEEE Transactions on Parallel and Distributed Systems, August 2004, 15(8): 755–768.

    Article  Google Scholar 

  13. Wilson A W. Hierarchical cache/bus architecture for shared memory multiprocessors. In Proc. the 14th Annual International Symposium on Computer Architecture, Pittsburgh, USA, June 2–5, 1987, pp.244–252.

  14. Zhang Y, Lu Z, Jantsch A, Li L, Gao M. Towards hierarchical cluster based cache coherence for large-scale network-on-chip. In Proc. the 4th IEEE International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS ’09), Cairo, Egypt, April 6–7, 2009, pp.119–122.

  15. Huh J et al. A NUCA substrate for flexible CMP cache sharing. In Proc. the 19th Annual International Conference on Supercomputing, Massachusetts, USA, June 20–22, 2005, pp.31–40.

  16. Hardavellas N, Ferdman M, Falsafi B, Ailamaki A. R-NUCA: Data placement in distributed shared caches. In Proc. the 36th Annual International Symposium on Computer Architecture, Texas, USA, June 20–24, 2009.

  17. Herrero E, Gonzáez J, Canal R. Distributed cooperative caching. In Proc. the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT 2008), Toronto, Canada, Oct. 25–29, 2008, pp.134–143.

  18. Eisley N, Peh L S, Shang L. Leveraging on-chip networks for data cache migration in chip multiprocessors. In Proc. the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT 2008), Toronto, Canada, Oct. 25–29, 2008, pp.197–207.

  19. Beckmann B, Marty M, Wood D. ASR: Adaptive selective replication for CMP caches. In Proc. the 39th Annual IEEE/ACM International Symposium on Microarchitecture, Orlando, USA, Dec. 9–13, 2006, pp.321–332.

  20. https://www.simics.net/.

  21. Martin M M K, Sorin D J, Beckmann B M, Marty M R, Xu M, Alameldeen A R, Moore K E, Hill M D, Wood D A. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. Computer Architecture News (CAN), September 2005, 33(4): 92–99.

    Article  Google Scholar 

  22. Woo S C, Ohara M, Torrie E, Singh J P, Gupta A. The SPLASH-2 programs: Characterization and methodological considerations. In Proc. the 22nd Annual International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 22–24, 1995, pp.24–37.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hai-Xia Wang.

Additional information

This work is supported by the National Natural Science Foundation of China under Grant Nos. 60673145, 60773146 and 60833004.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, SL., Wang, HX., Xue, YB. et al. Hierarchical Cache Directory for CMP. J. Comput. Sci. Technol. 25, 246–256 (2010). https://doi.org/10.1007/s11390-010-9321-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-010-9321-5

Keywords

Navigation