Skip to main content

SRP: Symbiotic Resource Partitioning of the Memory Hierarchy in CMPs

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5952))

Abstract

There have been many recent works in the context of Chip Multiprocessors (CMPs) investigating the need of intelligent shared cache partitioning which is believed to reduce the pressure on the off-chip bandwidth. Management of the off-chip memory bandwidth to improve system performance and/or mitigate performance volatility of applications has itself received considerable attention. Coordinated resource management schemes treat the interactions between cache allocation and bandwidth management as a black-box. This hinders the ability of these schemes from exploiting the intricate inter-relationships between the resource management strategies. In a multiprogrammed scenario, given the limited availability of the on-chip cache, it is not feasible to entirely eliminate off-chip accesses. However, it is possible to mitigate the impact of additional queueing delays associated with the memory controller by avoiding multiple applications from exercising the off-chip bandwidth simultaneously. Therefore, from the point of view of improving system performance, it is more important to have a symbiotic resource partitioning scheme that performs partitioning of each resource based on feedback it receives from the partitioning of the other.

Symbiotic resource partitioning (SRP) proposed in this paper avoids the scenarios of multiple applications exercising the off-chip memory bandwidth simultaneously by appropriately controlling the cache partitioning. In order to control the cache partitioning, SRP employs an empirical model that relies on a metric (last level cache misses per cycle) that represents the off-chip memory bandwidth demand of the applications and models the impact of cache partitioning on bandwidth demand by representing the last level cache misses per cycle metric as a function of the cache allocation per application. This model is dynamically updated to account for the phase behavior of the applications. Moreover, SRP is an iterative approach wherein each iteration of the approach consists of an update to the model, cache partitioning and bandwidth partitioning with a feedback from bandwidth partitioning that updates the model. Extensive simulations with a full system simulator and applications from the MiBench benchmark suite shows that SRP leads to a significant overall improvement in system performance as compared to a state-of the-art cache and bandwidth management schemes.

This research is supported in part by NSF grants CNS #0720645, CCF #0811687, CCF #0702519, CNS #0202007 and CNS #0509251, a grant from Microsoft Corporation and support from the Gigascale Systems Research Focus Center, one of the five research centers funded under SRC’s Focus Center Research Program.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chandra, D., Guo, F., Kim, S., Solihin, Y.: Predicting inter-thread cache contention on a chip multi-processor architecture. In: Proc. of the 11th International Symposium on High-Performance Computer Architecture (2005)

    Google Scholar 

  2. Srikantaiah, S., Kandemir, M., Irwin, M.J.: Adaptive set pinning: managing shared caches in chip multiprocessors. In: Proc. of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems (2008)

    Google Scholar 

  3. Hsu, L.R., Reinhardt, S.K., Iyer, R., Makineni, S.: Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource. In: Proc. of the 15th International Conference on Parallel Architectures and Compilation Techniques (2006)

    Google Scholar 

  4. Burger, D., Goodman, J.R., Kägi, A.: Memory bandwidth limitations of future microprocessors. In: Proceedings of the International Symposium on Computer Architecture (1996)

    Google Scholar 

  5. Chang, J., Sohi, G.S.: Cooperative cache partitioning for chip multiprocessors. In: Proc. of the 21st Annual International Conference on Supercomputing (2007)

    Google Scholar 

  6. Guo, F., Solihin, Y., Zhao, L., Iyer, R.: A framework for providing quality of service in chip multi-processors. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (2007)

    Google Scholar 

  7. Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning in a chip multiprocessor architecture. In: Proc. of the 13th International Conference on Parallel Architectures and Compilation Techniques (2004)

    Google Scholar 

  8. Qureshi, M.K., Patt, Y.N.: Utility-based cache partitioning: A low-overhead, high-performance, runtime mechanism to partition shared caches. In: Proc. of the 39th Annual International Symposium on Microarchitecture (2006)

    Google Scholar 

  9. Rafique, N., Lim, W.T., Thottethodi, M.: Architectural support for operating system-driven CMP cache management. In: Proc. of the 15th International Conference on Parallel Architectures and Compilation Techniques (2006)

    Google Scholar 

  10. Suh, G.E., Rudolph, L., Devadas, S.: Dynamic partitioning of shared cache memory. J. Supercomput. 28(1) (2004)

    Google Scholar 

  11. Ipek, E., Mutlu, O., Martínez, J.F., Caruana, R.: Self-optimizing memory controllers: A reinforcement learning approach. In: Proceedings of the 35th International Symposium on Computer Architecture (2008)

    Google Scholar 

  12. Lee, C.J., Mutlu, O., Narasiman, V., Patt, Y.N.: Prefetch-aware DRAM controllers. In: Proceedings of the International Symposium on Microarchitecture (2008)

    Google Scholar 

  13. Mutlu, O., Moscibroda, T.: Stall-time fair memory access scheduling for chip multiprocessors. In: Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (2007)

    Google Scholar 

  14. Mutlu, O., Moscibroda, T.: Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems. In: Proceedings of the 35th International Symposium on Computer Architecture (2008)

    Google Scholar 

  15. Nesbit, K.J., Aggarwal, N., Laudon, J., Smith, J.E.: Fair queuing memory systems. In: Proceedings of the International Symposium on Microarchitecture (2006)

    Google Scholar 

  16. Rafique, N., Lim, W.T., Thottethodi, M.: Effective management of DRAM bandwidth in multicore processors. In: Proc. of the 16th International Conference on Parallel Architecture and Compilation Techniques (2007)

    Google Scholar 

  17. Iyer, R., Zhao, L., Guo, F., Illikkal, R., Makineni, S., Newell, D., Solihin, Y., Hsu, L., Reinhardt, S.: QoS policies and architecture for cache/memory in CMP platforms. SIGMETRICS Perform. Eval. Rev. 35(1) (2007)

    Google Scholar 

  18. Bitirgen, R., Ipek, E., Martinez, J.F.: Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In: Proceedings of the International Symposium on Microarchitecture (2008)

    Google Scholar 

  19. Alameldeen, A.R., Wood, D.A.: Ipc considered harmful for multiprocessor workloads. IEEE Micro 26(4) (2006)

    Google Scholar 

  20. Guthaus, M.R., Ringenberg, J.S., Ernst, D., Austin, T.M., Mudge, T., Brown, R.B.: Mibench: A free, commercially representative embedded benchmark suite. In: Proceedings of the IEEE International Workshop on Workload Characterization (2001)

    Google Scholar 

  21. Micron: 1GB DDR2 SDRAM component: MT47H128M8HQ-25 (May 2007), http://download.micron.com/pdf/datasheets/dram/ddr2/1GbDDR2.pdf

  22. Martin, M.M.K., Sorin, D.J., Beckmann, B.M., Marty, M.R., Xu, M., Alameldeen, A.R., Moore, K.E., Hill, M.D., Wood, D.A.: Multifacet’s general execution-driven multiprocessor simulator (gems) toolset. SIGARCH Comput. Archit. News 33(4) (2005)

    Google Scholar 

  23. Rixner, S., Dally, W.J., Kapasi, U.J., Mattson, P., Owens, J.D.: Memory access scheduling. SIGARCH Comput. Archit. News 28(2) (2000)

    Google Scholar 

  24. Iyer, R.: CQoS: a framework for enabling QoS in shared caches of CMP platforms. In: Proc. of the 18th annual International Conference on Supercomputing (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Srikantaiah, S., Kandemir, M. (2010). SRP: Symbiotic Resource Partitioning of the Memory Hierarchy in CMPs. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2010. Lecture Notes in Computer Science, vol 5952. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11515-8_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-11515-8_21

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-11514-1

  • Online ISBN: 978-3-642-11515-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics