Collaborative Memories in Clusters: Opportunities and Challenges

Samih, Ahmad; Wang, Ren; Maciocco, Christian; Kharbutli, Mazen; Solihin, Yan

doi:10.1007/978-3-642-54212-1_2

Ahmad Samih¹⁸,
Ren Wang¹⁹,
Christian Maciocco¹⁹,
Mazen Kharbutli²⁰ &
…
Yan Solihin²¹

Part of the book series: Lecture Notes in Computer Science ((TCOMPUTATSCIE,volume 8360))

667 Accesses
4 Citations

Abstract

Highly-integrated distributed systems such as Intel Micro Server and SeaMicro Server are increasingly becoming a popular server architecture. Designers of such systems face interesting memory hierarchy design challenges while attempting to reduce/eliminate the notorious disk storage swapping. Disk swapping activities slow down applications’ execution drastically. Swapping to the free remote memory - near by nodes, through Memory Collaboration has demonstrated its cost-effectiveness compared to overprovisioning memory for peak load requirements. Recent studies propose several ways to access the under-utilized remote memory in static system configurations, without detailed exploration of dynamic memory collaboration. Dynamic collaboration is an important aspect given the run-time memory usage fluctuations in clustered systems. Furthermore, with the growing interest in memory collaboration, it is crucial to understand the existing performance bottlenecks, overheads, and potential optimizations.

In this paper we address these two issues. First, we propose an Autonomous Collaborative Memory System (ACMS) that manages memory resources dynamically at run time, to optimize performance, and provide QoS measures for nodes engaging in the system. We implement a prototype realizing the proposed ACMS, experiment with a wide range of real-world applications, and show up to 3x performance speedup compared to a non-collaborative memory system, without perceivable performance impact on nodes that provide memory. Second, we analyze, in depth, the end-to-end memory collaboration overhead and bottlenecks. Based on this analysis, we provide insights on several corresponding optimizations to further improve the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Agarwal, A.: Facebook: Science and the social graph (2009), http://www.infoq.com/presentations/Facebook-Software-Stack ; Presented in QCon San Francisco
Apache: Hadoop (2011), http://hadoop.apache.org/
Baumann, A., Barham, P., Dagand, P.E., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schuepbach, A., Singhania, A.: The multikernel: a new OS architecture for scalable multicore systems. In: SOSP 2009: Proceedings of the 22nd ACM Symposium on Operating Systems Principles. ACM Press, New York (2009)
Google Scholar
Beckmann, B.M., Marty, M.R., Wood, D.A.: ASR: Adaptive Selective Replication for CMP Caches. In: MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society (2006), http://dx.doi.org/10.1109/MICRO.2006.10
Chang, J., Sohi, G.S.: Cooperative Caching for Chip Multiprocessors. In: 33rd International Symposium on Computer Architecture, ISCA, 2006 (2006), http://dx.doi.org/10.1109/ISCA.2006.17 , doi:10.1109/ISCA.2006.17
Chen, H., Luo, Y., Wang, X., Zhang, B., Sun, Y., Wang, Z.: A transparent remote paging model for virtual machines (2008)
Google Scholar
Chishti, Z., Powell, M.D., Vijaykumar., T.N.: Optimizing Replication, Communication and Capacity Allocation in CMPs. In: The 32th ISCA (June 2005)
Google Scholar
Corp., I.: Chip shot: Intel outlines low-power micro server strategy (2011)
Google Scholar
Dhiman, G., Ayoub, R., Rosing, T.: PDRAM: a hybrid PRAM and DRAM main memory system. In: Proceedings of the 46th Annual Design Automation Conference, DAC 2009, pp. 469–664. ACM, New York (2009), doi: http://doi.acm.org/10.1145/1629911.1630086
Fedora Project: Intel. Core. i7-800 Processor Series (2010), http://fedoraproject.org/
Grant, R., Balaji, P., Afsahi, A.: A study of hardware assisted ip over infiniband and its impact on enterprise data center performance. In: 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS), pp. 144–153 (2010), doi:10.1109/ISPASS.2010.5452035
Google Scholar
Huggahalli, R., Iyer, R., Tetrick, S.: Direct cache access for high bandwidth network i/o. In: Proceedings of the 32nd Annual International Symposium on Computer Architecture, ISCA 2005, pp. 50–59. IEEE Computer Society, Washington, DC (2005), http://dx.doi.org/10.1109/ISCA.2005.23
Google Scholar
Intel Corp.: Thunderbolt Technology (2011), http://www.intel.com/technology/io/thunderbolt/index.htm
Intel Microarchitecture: Intel. Core. i7-800 Processor Series (2010), http://download.intel.com/products/processor/corei7/319724.pdf
Howard, J., Dighe, S.: A 48-core ia-32 message-passing processor with dvfs in 45nm cmos. In: Proceedings of the International Solid-State Circuits Conference (ISCC), ISSCC, 2010 (2010)
Google Scholar
Kyasanur, P., Choudhury, R.R., Gupta, I.: Smart gossip: An adaptive gossip-based broadcasting service for sensor networks. In: 2006 IEEE International Conference on Mobile Adhoc and Sensor Systems (MASS), pp. 91–100 (2006), doi:10.1109/MOBHOC.2006.278671
Google Scholar
Liang, S., Noronha, R., Panda, D.: Swapping to remote memory over InfiniBand: An approach using a high performance network block device. In: IEEE International Cluster Computing, pp. 1–10 (2005), doi: 10.1109/CLUSTR.2005.347050
Google Scholar
Lim, K., Chang, J., Mudge, T., Ranganathan, P., Reinhardt, S.K., Wenisch, T.F.: Disaggregated memory for expansion and sharing in blade servers. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA 2009, pp. 267–278. ACM, New York (2009), doi: http://doi.acm.org/10.1145/1555754.1555789
Markatos, E., Markatos, E.P., Dramitinos, G., Dramitinos, G.: Implementation of a reliable remote memory pager. In: USENIX Annual Technical Conference, pp. 177–190 (1996)
Google Scholar
Markatos, E.P., Dramitinos, G.: Adding flexibility to a remote memory pager (1996)
Google Scholar
Massie, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: Design, implementation and experience (2004)
Google Scholar
Midorikawa, H., Kurokawa, M., Himeno, R., Sato, M.: DLM: A distributed large memory system using remote memory swapping over cluster nodes. In: 2008 IEEE International Conference on Cluster Computing, pp. 268–273 (2008), doi:10.1109/CLUSTR.2008.4663780
Google Scholar
Network Block Device TCP version: NBD (2011), http://nbd.sourceforge.net/
Newhall, T., Finney, S., Ganchev, K., Spiegel, M.: Nswap: A network swapping module for linux clusters (2003)
Google Scholar
Ousterhout, J.K., Agrawal, P., Erickson, D., Kozyrakis, C., Leverich, J., Mazières, D., Mitra, S., Narayanan, A., Rosenblum, M., Rumble, S.M., Stratmann, E., Stutsman, R.: The case for ramclouds: Scalable high-performance storage entirely in DRAM. In: SIGOPS OSR. Stanford InfoLab (2009), http://ilpubs.stanford.edu:8090/942/
Peterson, L., Davie, B.: Computer networks, 5th edn. (2011)
Google Scholar
Qureshi, M.: Adaptive Spill-Receive for Robust High-Performance Caching in CMPs. In: IEEE 15th International Symposium on High Performance Computer Architecture, HPCA (2009), doi:10.1109/HPCA.2009.4798236
Google Scholar
Qureshi, M.K., Franceschini, M.M., Lastras-Montaño, L.A., Karidis, J.P.: Morphable memory system: a robust architecture for exploiting multi-level phase change memories. SIGARCH Comput. Archit. News 38, 153–162 (2010), doi: http://doi.acm.org/10.1145/1816038.1815981
Qureshi, M.K., Srinivasan, V., Rivers, J.A.: Scalable high performance main memory system using phase-change memory technology. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA 2009, pp. 24–33. ACM, New York (2009), doi: http://doi.acm.org/10.1145/1555754.1555760
Rafique, N., Lim, W.T., Thottethodi, M.: Architectural support for operating system-driven CMP cache management. In: PACT 2006: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, ACM (2006), doi: http://doi.acm.org/10.1145/1152154.1152160
Ramos, L.E., Gorbatov, E., Bianchini, R.: Page placement in hybrid memory systems. In: Proceedings of the International Conference on Supercomputing, ICS 2011, pp. 85–95. ACM, New York (2011), doi: http://doi.acm.org/10.1145/1995896.1995911
Rao, A.: Seamicro technology overview (2010)
Google Scholar
Samih, A., Krishna, A., Solihin, Y.: Understanding the limits of capacity sharing in CMP Private Caches, in CMP-MSI (2009)
Google Scholar
Samih, A., Krishna, A., Solihin, Y.: Evaluating Placement Policies for Managing Capacity Sharing in CMP Architectures with Private Caches. ACM Transactions on Architecture and Code Optimization (TACO) 8(3) (2011)
Google Scholar
Soares, L., Stumm, M.: Flexsc: flexible system call scheduling with exception-less system calls. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI 2010, pp. 1–8. USENIX Association, Berkeley (2010), http://dl.acm.org/citation.cfm?id=1924943.1924946
Google Scholar
SPEC: SPECjbb2005, http://www.spec.org/jbb2005/
Standard Performance Evaluation Corporation (2006), http://www.specbench.org
Suh, G., Devadas, S., Rudolph, L.: A new memory monitoring scheme for memory-aware scheduling and partitioning. In: Proceedings of the Eighth International Symposium on High-Performance Computer Architecture, pp. 117–128 (2002), doi:10.1109/HPCA.2002.995703
Google Scholar
Tam, D.K., Azimi, R., Soares, L.B., Stumm, M.: RapidMRC: Approximating L2 Miss Rate Curves on Commodity Systems for Online Optimizations. SIGPLAN Not. 44(3) (2009), doi: http://doi.acm.org/10.1145/1508284.1508259
Tanenbaum, A.S., Van Renesse, R.: Distributed operating systems. ACM Comput. Surv. 17, 419–470 (1985), doi: http://doi.acm.org/10.1145/6041.6074
Transaction Processing Performance Council: TPC-H 2.14.2 (2011), http://www.tpc.org/tpch/
vmware : experience game-changing virtual machine mobility, http://www.vmware.com/products/vmotion/overview.html (2011)
Wang, N., Liu, X., He, J., Han, J., Zhang, L., Xu, Z.: Collaborative memory pool in cluster system. In: International Conference on Parallel Processing, ICPP 2007, p. 17 (2007), doi:10.1109/ICPP.2007.25
Google Scholar
Zhang, M., Asanovic, K.: Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. In: ISCA 2005: Proceedings of the 32nd Annual International Symposium on Computer Architecture, IEEE Computer Society (2005), doi: http://dx.doi.org/10.1109/ISCA.2005.53

Download references

Author information

Authors and Affiliations

Intel Architecture Group, Austin, Texas, USA
Ahmad Samih
Intel Labs, Hillsboro, Oregon, USA
Ren Wang & Christian Maciocco
Jordan University of Science and Technology, Irbid, Jordan
Mazen Kharbutli
North Carolina State University, Raleigh, NC, USA
Yan Solihin

Authors

Ahmad Samih
View author publications
You can also search for this author in PubMed Google Scholar
Ren Wang
View author publications
You can also search for this author in PubMed Google Scholar
Christian Maciocco
View author publications
You can also search for this author in PubMed Google Scholar
Mazen Kharbutli
View author publications
You can also search for this author in PubMed Google Scholar
Yan Solihin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Calgary, AB, Canada
Marina L. Gavrilova
CloudFabriQ Ltd., London, UK
C. J. Kenneth Tan

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Samih, A., Wang, R., Maciocco, C., Kharbutli, M., Solihin, Y. (2014). Collaborative Memories in Clusters: Opportunities and Challenges. In: Gavrilova, M.L., Tan, C.J.K. (eds) Transactions on Computational Science XXII. Lecture Notes in Computer Science, vol 8360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54212-1_2

Download citation

DOI: https://doi.org/10.1007/978-3-642-54212-1_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54211-4
Online ISBN: 978-3-642-54212-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics