SRP: Symbiotic Resource Partitioning of the Memory Hierarchy in CMPs

Srikantaiah, Shekhar; Kandemir, Mahmut

doi:10.1007/978-3-642-11515-8_21

SRP: Symbiotic Resource Partitioning of the Memory Hierarchy in CMPs

Shekhar Srikantaiah²¹ &
Mahmut Kandemir²¹

Conference paper

1251 Accesses
4 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5952))

Abstract

There have been many recent works in the context of Chip Multiprocessors (CMPs) investigating the need of intelligent shared cache partitioning which is believed to reduce the pressure on the off-chip bandwidth. Management of the off-chip memory bandwidth to improve system performance and/or mitigate performance volatility of applications has itself received considerable attention. Coordinated resource management schemes treat the interactions between cache allocation and bandwidth management as a black-box. This hinders the ability of these schemes from exploiting the intricate inter-relationships between the resource management strategies. In a multiprogrammed scenario, given the limited availability of the on-chip cache, it is not feasible to entirely eliminate off-chip accesses. However, it is possible to mitigate the impact of additional queueing delays associated with the memory controller by avoiding multiple applications from exercising the off-chip bandwidth simultaneously. Therefore, from the point of view of improving system performance, it is more important to have a symbiotic resource partitioning scheme that performs partitioning of each resource based on feedback it receives from the partitioning of the other.

Symbiotic resource partitioning (SRP) proposed in this paper avoids the scenarios of multiple applications exercising the off-chip memory bandwidth simultaneously by appropriately controlling the cache partitioning. In order to control the cache partitioning, SRP employs an empirical model that relies on a metric (last level cache misses per cycle) that represents the off-chip memory bandwidth demand of the applications and models the impact of cache partitioning on bandwidth demand by representing the last level cache misses per cycle metric as a function of the cache allocation per application. This model is dynamically updated to account for the phase behavior of the applications. Moreover, SRP is an iterative approach wherein each iteration of the approach consists of an update to the model, cache partitioning and bandwidth partitioning with a feedback from bandwidth partitioning that updates the model. Extensive simulations with a full system simulator and applications from the MiBench benchmark suite shows that SRP leads to a significant overall improvement in system performance as compared to a state-of the-art cache and bandwidth management schemes.

This research is supported in part by NSF grants CNS #0720645, CCF #0811687, CCF #0702519, CNS #0202007 and CNS #0509251, a grant from Microsoft Corporation and support from the Gigascale Systems Research Focus Center, one of the five research centers funded under SRC’s Focus Center Research Program.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chandra, D., Guo, F., Kim, S., Solihin, Y.: Predicting inter-thread cache contention on a chip multi-processor architecture. In: Proc. of the 11th International Symposium on High-Performance Computer Architecture (2005)
Google Scholar
Srikantaiah, S., Kandemir, M., Irwin, M.J.: Adaptive set pinning: managing shared caches in chip multiprocessors. In: Proc. of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (2008)
Google Scholar
Hsu, L.R., Reinhardt, S.K., Iyer, R., Makineni, S.: Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In: Proc. of the 15th International Conference on Parallel Architectures and Compilation Techniques (2006)
Google Scholar
Burger, D., Goodman, J.R., Kägi, A.: Memory bandwidth limitations of future microprocessors. In: Proceedings of the International Symposium on Computer Architecture (1996)
Google Scholar
Chang, J., Sohi, G.S.: Cooperative cache partitioning for chip multiprocessors. In: Proc. of the 21st Annual International Conference on Supercomputing (2007)
Google Scholar
Guo, F., Solihin, Y., Zhao, L., Iyer, R.: A framework for providing quality of service in chip multi-processors. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (2007)
Google Scholar
Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proc. of the 13th International Conference on Parallel Architectures and Compilation Techniques (2004)
Google Scholar
Qureshi, M.K., Patt, Y.N.: Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proc. of the 39th Annual International Symposium on Microarchitecture (2006)
Google Scholar
Rafique, N., Lim, W.T., Thottethodi, M.: Architectural support for operating system-driven CMP cache management. In: Proc. of the 15th International Conference on Parallel Architectures and Compilation Techniques (2006)
Google Scholar
Suh, G.E., Rudolph, L., Devadas, S.: Dynamic partitioning of shared cache memory. J. Supercomput. 28(1) (2004)
Google Scholar
Ipek, E., Mutlu, O., Martínez, J.F., Caruana, R.: Self-optimizing memory controllers: A reinforcement learning approach. In: Proceedings of the 35th International Symposium on Computer Architecture (2008)
Google Scholar
Lee, C.J., Mutlu, O., Narasiman, V., Patt, Y.N.: Prefetch-aware DRAM controllers. In: Proceedings of the International Symposium on Microarchitecture (2008)
Google Scholar
Mutlu, O., Moscibroda, T.: Stall-time fair memory access scheduling for chip multiprocessors. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (2007)
Google Scholar
Mutlu, O., Moscibroda, T.: Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In: Proceedings of the 35th International Symposium on Computer Architecture (2008)
Google Scholar
Nesbit, K.J., Aggarwal, N., Laudon, J., Smith, J.E.: Fair queuing memory systems. In: Proceedings of the International Symposium on Microarchitecture (2006)
Google Scholar
Rafique, N., Lim, W.T., Thottethodi, M.: Effective management of DRAM bandwidth in multicore processors. In: Proc. of the 16th International Conference on Parallel Architecture and Compilation Techniques (2007)
Google Scholar
Iyer, R., Zhao, L., Guo, F., Illikkal, R., Makineni, S., Newell, D., Solihin, Y., Hsu, L., Reinhardt, S.: QoS policies and architecture for cache/memory in CMP platforms. SIGMETRICS Perform. Eval. Rev. 35(1) (2007)
Google Scholar
Bitirgen, R., Ipek, E., Martinez, J.F.: Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In: Proceedings of the International Symposium on Microarchitecture (2008)
Google Scholar
Alameldeen, A.R., Wood, D.A.: Ipc considered harmful for multiprocessor workloads. IEEE Micro 26(4) (2006)
Google Scholar
Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: Mibench: A free, commercially representative embedded benchmark suite. In: Proceedings of the IEEE International Workshop on Workload Characterization (2001)
Google Scholar
Micron: 1GB DDR2 SDRAM component: MT47H128M8HQ-25 (May 2007), http://download.micron.com/pdf/datasheets/dram/ddr2/1GbDDR2.pdf
Martin, M.M.K., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News 33(4) (2005)
Google Scholar
Rixner, S., Dally, W.J., Kapasi, U.J., Mattson, P., Owens, J.D.: Memory access scheduling. SIGARCH Comput. Archit. News 28(2) (2000)
Google Scholar
Iyer, R.: CQoS: a framework for enabling QoS in shared caches of CMP platforms. In: Proc. of the 18th annual International Conference on Supercomputing (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA, 16802, USA
Shekhar Srikantaiah & Mahmut Kandemir

Authors

Shekhar Srikantaiah
View author publications
You can also search for this author in PubMed Google Scholar
Mahmut Kandemir
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, The University of Texas at Austin, 1 University Station C0803, TX 78712-0240, Austin, USA
Yale N. Patt
Dipartimento di Ingegneria della Informazione, Università di Pisa, Via Diotisalvi 2, 56100, Pisa, Italy
Pierfrancesco Foglia
IBM T.J.Watson Research Center, 19 Skyline Drive, NY 10532, Hawthorne, USA
Evelyn Duesterwald
Hewlett-Packard, Cami de Can Graells 1-21, Sant Cugat del Vallés, 08174, Barcelona, Spain
Paolo Faraboschi
Computer Architecture Department, Technical University of Catalunya (UPC), c/Jordi Girona 1-3, 08034, Barcelona, Spain
Xavier Martorell

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Srikantaiah, S., Kandemir, M. (2010). SRP: Symbiotic Resource Partitioning of the Memory Hierarchy in CMPs. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2010. Lecture Notes in Computer Science, vol 5952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11515-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-11515-8_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11514-1
Online ISBN: 978-3-642-11515-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics