Abstract
In this work we introduce power optimizations relying on partial tag comparison (PTC) in snoop-based chip multiprocessors. Our optimizations rely on the observation that detecting tag mismatches in a snoop-based chip multiprocessor does not require aggressively processing the entire tag. In fact, a high percentage of cache mismatches could be detected by utilizing a small subset but highly informative portion of the tag bits.
Based on this, we introduce a source-based snoop filtering mechanism referred to as S-PTC. In S-PTC possible remote tag mismatches are detected prior to sending the request. We reduce power as S-PTC prevents sending unnecessary snoops and avoids unessential tag lookups at the end-points. Furthermore, S-PTC improves performance as a result of early cache miss detection.
S-PTC improves average performance from 2.9% to 3.5% for different configurations and for the SPLASH-2 benchmarks used in this study. Our solutions reduce snoop request bandwidth from 78.5% to 81.9% and average tag array dynamic power by about 52%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adve, S.V., Gharachorloo, K.: Shared Memory Consistency Models: A Tutorial. Computer 29(12), 66–76 (1996)
IBM. Power4, http://www.research.ibm.com/power4
Agrawal, N., Peh, L.-S., Jha, N.K.: In-Network Coherence Filtering: Snoop Coherence without Broadcast. In: Proceedings of International Symposium on Microarchitecture, New York City, New York (December 2009)
Moshovos, A.: RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence. In: Proceedings of International Symposium on Computer Architecture (June 2005)
Cantin, J.F., Lipasti, M.H., Smith, J.E.: Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking. In: Proceeding of the International Symposium on Computer Architecture (June 2005)
Salapura, V., Blumrich, M., Gara, A.: Design and Implementation of the Blue Gene/P Snoop Filter. In: Proceedings of International Symposium on High Performance Computer Architecture (February 2007)
Ballapuram, C.S., Sharif, A., Lee, H.-H.S.: Exploiting Access Semantics and Program Behavior to Reduce Snoop Power in Chip Multiprocessors. In: Proceeding of the International Conference on Architectural Support for Programming Languages and Operating Systems (March 2008)
Kumar, R., Zyuban, V., Tullsen, D.: Interconnections in Multi-core Architectures: Understanding Mechanisms, Overheads and Scaling. In: ISCA (June 2005)
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 Programs: Characterization and Methodological Considerations. In: International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, pp. 24–36 (June 1995)
University of Illinois at Urbana-Champaign (2005), http://sesc.sourceforge.net
Sun Niagara, http://www.sun.com/processors/throughput/
Muralimanohar, N., Balasubramonian, R., Jouppi, N.: Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In: Proceedings of the 40th International Symposium on Microarchitecture (December 2007)
Cheng, L., et al.: Interconnect-Aware Coherence Protocols for Chip Multiprocessors. In: Proceeding 33rd International Symposium on Computer Architecture, pp. 339–351. IEEE CS Press, Los Alamitos (2006)
Bilir, E.E., Dickson, R.M., Hu, Y., Plakal, M., Sorin, D.J., Hill, M.D., Wood, D.A.: Multicast Snooping: A New Coherence Method using a Multicast Address Network. SIGARCH Computer Architecture News, 294–304 (1999)
Martin, M.M.K., Harper, P.J., Sorin, D.J., Hill, M.D., Wood, D.A.: Using Destination-Set Prediction to Improve the Latency/Bandwidth Tradeoff in Shared-Memory Multiprocessors. In: Proceedings of International Symposium on Computer Architecture (June 2003)
Atoofian, E., Baniasadi, A.: Using Supplier Locality in Power-Aware Interconnects and Caches in Chip Multiprocessors. Journal of Systems Architecture 54(5), 507–518 (2007)
Moshovos, A., Memik, G., Falsafi, B., Choudhary, A.: Jetty: Filtering Snoops for Reduced Energy Consumption in SMP Servers. In: Proceeding of the 7th International Symposium on High- Performance Computer Architecture (January 2001)
Ekman, M., Dahlgren, F., Stenstrm, P.: TLB and Snoop Energy-Reduction Using Virtual Caches for Low-Power Chip-Multiprocessors. In: Proceeding of ACM International Symposium on Low Power Electronics and Design (August 2002)
Bloom, B.H.: Space/Time Trade-offs in Hash Coding with Allowable Errors. Communication of the ACM (1970)
Saldanha, C., Lipasti, M.H.: Power Efficient Cache Coherence, High Performance Memory Systems. In: Hadimiouglu, H., Kaeli, D., Kuskin, J., Nanda, A., Torrellas, J. (eds.). Springer, Heidelberg (2003)
Strauss, K., Shen, X., Torrellas, J.: Flexible Snooping: Adaptive Forwarding and Filtering of Snoops in Embedded-Ring Multiprocessors. In: International Symposium on Computer Architecture, Boston, MA (June 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Shafiee, A., Shahidi, N., Baniasadi, A. (2011). Using Partial Tag Comparison in Low-Power Snoop-Based Chip Multiprocessors. In: Varbanescu, A.L., Molnos, A., van Nieuwpoort, R. (eds) Computer Architecture. ISCA 2010. Lecture Notes in Computer Science, vol 6161. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24322-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-24322-6_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24321-9
Online ISBN: 978-3-642-24322-6
eBook Packages: Computer ScienceComputer Science (R0)