Abstract
Processors have evolved to the now de-facto standard multi-core architecture. The continuous advances in technology allow for increased component density, thus resulting in a larger number of cores on the chip. This, in turn, places pressure on the off-chip and pin bandwidth. Large Last-Level Caches (LLC), which are shared among all cores, have been used as a way to control the out-of-chip requests.
In this work we focus on analyzing the memory behavior of a modern demanding application, a graph-based database workload, which is representative of future workloads. We analyze the performance of this application for different cache configurations in terms of: memory access time, bandwidth requirements, and power consumption. The experimental results show that the bandwidth requirements reduce as the number of clusters reduces and the LLC per cluster increases. This configuration is also the most power efficient. If on the other hand, memory latency is the dominant factor, assuming bandwidth is not a limitation, then the best configuration is the one with more clusters and smaller LLCs.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Intel: Single-chip Cloud Computer (2009), http://techresearch.intel.com/UserFiles/en-us/File/terascale/SCC-Overview.pdf
Martínez-Bazan, N., Muntés-Mulero, V., Gómez-Villamor, S., Nin, J., Sánchez-Martínez, M.A., Larriba-Pey, J.L.: Dex: high-performance exploration on large graphs for information retrieval. In: CIKM 2007, pp. 573–582. ACM, New York (2007)
Nethercote, N., Seward, J.: Valgrind: a framework for heavyweight dynamic binary instrumentation. SIGPLAN Not. 42(6), 89–100 (2007)
Shivakumar, P., Jouppi, N.P.: CACTI 3.0: An integrated cache timing, power and area model. Technical report, Compaq Computer Corporation Western Research Laboratory (August 2001)
Barroso, L.A., Gharachorloo, K., McNamara, R., Nowatzyk, A., Qadeer, S., Sano, B., Smith, S., Stets, R., Verghese, B.: Piranha: a scalable architecture based on single-chip multiprocessing. In: ISCA 2000, pp. 282–293. ACM, New York (2000)
Kongetira, P., Aingaran, K., Olukotun, K.: Niagara: A 32-Way Multithreaded Sparc Processor. IEEE Micro. 25(2), 21–29 (2005)
Rogers, B.M., Krishna, A., Bell, G.B., Vu, K., Jiang, X., Solihin, Y.: Scaling the bandwidth wall: challenges in and avenues for cmp scaling. In: ISCA 2009, pp. 371–382. ACM, New York (2009)
Alameldeen, A.R., Wood, D.A.: Adaptive cache compression for high-performance processors. In: ISCA 2004, Washington, DC, USA, p. 212. IEEE Computer Society, Los Alamitos (2004)
Qureshi, M.K., Thompson, D., Patt, Y.N.: The v-way cache: Demand based associativity via global replacement. SIGARCH Comput. Archit. News 33(2), 544–555 (2005)
Kim, S., Chandra, D., Solihin, Y.: Fair cache sharing and partitioning in a chip multiprocessor architecture. In: PACT 2004, Washington, DC, USA, pp. 111–122. IEEE Computer Society, Los Alamitos (2004)
Chang, J., Sohi, G.S.: Cooperative caching for chip multiprocessors. In: ISCA 2006, Washington, DC, USA, pp. 264–276. IEEE Computer Society, Los Alamitos (2006)
Leverich, J., Arakida, H., Solomatnikov, A., Firoozshahian, A., Horowitz, M., Kozyrakis, C.: Comparative evaluation of memory models for chip multiprocessors. ACM Trans. Archit. Code Optim. 5(3), 1–30 (2008)
Olukotun, K., Nayfeh, B.A., Hammond, L., Wilson, K., Chang, K.: The case for a single-chip multiprocessor. In: Proceedings of the Seventh International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pp. 2–11. ACM Press, New York (1996)
Kumar, R., Tullsen, D.M., Ranganathan, P., Jouppi, N.P., Farkas, K.I.: Single-isa heterogeneous multi-core architectures for multithreaded workload performance. SIGARCH Comput. Archit. News 32(2), 64 (2004)
Hill, M.D., Marty, M.R.: Amdahl’s law in the multicore era. Computer 41, 33–38 (2008)
Kumar, R., Farkas, K.I., Jouppi, N.P., Ranganathan, P., Tullsen, D.M.: Single-isa heterogeneous multi-core architectures: The potential for processor power reduction. In: MICRO 36, Washington, DC, USA, p. 81. IEEE Computer Society, Los Alamitos (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Trancoso, P., Martinez, N., Larriba-Pey, JL. (2011). Memory-, Bandwidth-, and Power-Aware Multi-core for a Graph Database Workload. In: Berekovic, M., Fornaciari, W., Brinkschulte, U., Silvano, C. (eds) Architecture of Computing Systems - ARCS 2011. ARCS 2011. Lecture Notes in Computer Science, vol 6566. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19137-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-19137-4_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19136-7
Online ISBN: 978-3-642-19137-4
eBook Packages: Computer ScienceComputer Science (R0)