Advertisement

Collaborative Memories in Clusters: Opportunities and Challenges

  • Ahmad Samih
  • Ren Wang
  • Christian Maciocco
  • Mazen Kharbutli
  • Yan Solihin
Chapter
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8360)

Abstract

Highly-integrated distributed systems such as Intel Micro Server and SeaMicro Server are increasingly becoming a popular server architecture. Designers of such systems face interesting memory hierarchy design challenges while attempting to reduce/eliminate the notorious disk storage swapping. Disk swapping activities slow down applications’ execution drastically. Swapping to the free remote memory - near by nodes, through Memory Collaboration has demonstrated its cost-effectiveness compared to overprovisioning memory for peak load requirements. Recent studies propose several ways to access the under-utilized remote memory in static system configurations, without detailed exploration of dynamic memory collaboration. Dynamic collaboration is an important aspect given the run-time memory usage fluctuations in clustered systems. Furthermore, with the growing interest in memory collaboration, it is crucial to understand the existing performance bottlenecks, overheads, and potential optimizations.

In this paper we address these two issues. First, we propose an Autonomous Collaborative Memory System (ACMS) that manages memory resources dynamically at run time, to optimize performance, and provide QoS measures for nodes engaging in the system. We implement a prototype realizing the proposed ACMS, experiment with a wide range of real-world applications, and show up to 3x performance speedup compared to a non-collaborative memory system, without perceivable performance impact on nodes that provide memory. Second, we analyze, in depth, the end-to-end memory collaboration overhead and bottlenecks. Based on this analysis, we provide insights on several corresponding optimizations to further improve the performance.

Keywords

Local Memory Memory Usage Round Trip Time Hard Disk Drive Remote Memory 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agarwal, A.: Facebook: Science and the social graph (2009), http://www.infoq.com/presentations/Facebook-Software-Stack; Presented in QCon San Francisco
  2. 2.
    Apache: Hadoop (2011), http://hadoop.apache.org/
  3. 3.
    Baumann, A., Barham, P., Dagand, P.E., Harris, T., Isaacs, R., Peter, S., Roscoe, T., Schuepbach, A., Singhania, A.: The multikernel: a new OS architecture for scalable multicore systems. In: SOSP 2009: Proceedings of the 22nd ACM Symposium on Operating Systems Principles. ACM Press, New York (2009)Google Scholar
  4. 4.
    Beckmann, B.M., Marty, M.R., Wood, D.A.: ASR: Adaptive Selective Replication for CMP Caches. In: MICRO 39: Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society (2006), http://dx.doi.org/10.1109/MICRO.2006.10
  5. 5.
    Chang, J., Sohi, G.S.: Cooperative Caching for Chip Multiprocessors. In: 33rd International Symposium on Computer Architecture, ISCA, 2006 (2006), http://dx.doi.org/10.1109/ISCA.2006.17, doi:10.1109/ISCA.2006.17
  6. 6.
    Chen, H., Luo, Y., Wang, X., Zhang, B., Sun, Y., Wang, Z.: A transparent remote paging model for virtual machines (2008)Google Scholar
  7. 7.
    Chishti, Z., Powell, M.D., Vijaykumar., T.N.: Optimizing Replication, Communication and Capacity Allocation in CMPs. In: The 32th ISCA (June 2005)Google Scholar
  8. 8.
    Corp., I.: Chip shot: Intel outlines low-power micro server strategy (2011)Google Scholar
  9. 9.
    Dhiman, G., Ayoub, R., Rosing, T.: PDRAM: a hybrid PRAM and DRAM main memory system. In: Proceedings of the 46th Annual Design Automation Conference, DAC 2009, pp. 469–664. ACM, New York (2009), doi: http://doi.acm.org/10.1145/1629911.1630086
  10. 10.
    Fedora Project: Intel. Core. i7-800 Processor Series (2010), http://fedoraproject.org/
  11. 11.
    Grant, R., Balaji, P., Afsahi, A.: A study of hardware assisted ip over infiniband and its impact on enterprise data center performance. In: 2010 IEEE International Symposium on Performance Analysis of Systems Software (ISPASS), pp. 144–153 (2010), doi:10.1109/ISPASS.2010.5452035Google Scholar
  12. 12.
    Huggahalli, R., Iyer, R., Tetrick, S.: Direct cache access for high bandwidth network i/o. In: Proceedings of the 32nd Annual International Symposium on Computer Architecture, ISCA 2005, pp. 50–59. IEEE Computer Society, Washington, DC (2005), http://dx.doi.org/10.1109/ISCA.2005.23 Google Scholar
  13. 13.
    Intel Corp.: Thunderbolt Technology (2011), http://www.intel.com/technology/io/thunderbolt/index.htm
  14. 14.
    Intel Microarchitecture: Intel. Core. i7-800 Processor Series (2010), http://download.intel.com/products/processor/corei7/319724.pdf
  15. 15.
    Howard, J., Dighe, S.: A 48-core ia-32 message-passing processor with dvfs in 45nm cmos. In: Proceedings of the International Solid-State Circuits Conference (ISCC), ISSCC, 2010 (2010)Google Scholar
  16. 16.
    Kyasanur, P., Choudhury, R.R., Gupta, I.: Smart gossip: An adaptive gossip-based broadcasting service for sensor networks. In: 2006 IEEE International Conference on Mobile Adhoc and Sensor Systems (MASS), pp. 91–100 (2006), doi:10.1109/MOBHOC.2006.278671Google Scholar
  17. 17.
    Liang, S., Noronha, R., Panda, D.: Swapping to remote memory over InfiniBand: An approach using a high performance network block device. In: IEEE International Cluster Computing, pp. 1–10 (2005), doi: 10.1109/CLUSTR.2005.347050Google Scholar
  18. 18.
    Lim, K., Chang, J., Mudge, T., Ranganathan, P., Reinhardt, S.K., Wenisch, T.F.: Disaggregated memory for expansion and sharing in blade servers. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA 2009, pp. 267–278. ACM, New York (2009), doi: http://doi.acm.org/10.1145/1555754.1555789
  19. 19.
    Markatos, E., Markatos, E.P., Dramitinos, G., Dramitinos, G.: Implementation of a reliable remote memory pager. In: USENIX Annual Technical Conference, pp. 177–190 (1996)Google Scholar
  20. 20.
    Markatos, E.P., Dramitinos, G.: Adding flexibility to a remote memory pager (1996)Google Scholar
  21. 21.
    Massie, M.L., Chun, B.N., Culler, D.E.: The ganglia distributed monitoring system: Design, implementation and experience (2004)Google Scholar
  22. 22.
    Midorikawa, H., Kurokawa, M., Himeno, R., Sato, M.: DLM: A distributed large memory system using remote memory swapping over cluster nodes. In: 2008 IEEE International Conference on Cluster Computing, pp. 268–273 (2008), doi:10.1109/CLUSTR.2008.4663780Google Scholar
  23. 23.
    Network Block Device TCP version: NBD (2011), http://nbd.sourceforge.net/
  24. 24.
    Newhall, T., Finney, S., Ganchev, K., Spiegel, M.: Nswap: A network swapping module for linux clusters (2003)Google Scholar
  25. 25.
    Ousterhout, J.K., Agrawal, P., Erickson, D., Kozyrakis, C., Leverich, J., Mazières, D., Mitra, S., Narayanan, A., Rosenblum, M., Rumble, S.M., Stratmann, E., Stutsman, R.: The case for ramclouds: Scalable high-performance storage entirely in DRAM. In: SIGOPS OSR. Stanford InfoLab (2009), http://ilpubs.stanford.edu:8090/942/
  26. 26.
    Peterson, L., Davie, B.: Computer networks, 5th edn. (2011)Google Scholar
  27. 27.
    Qureshi, M.: Adaptive Spill-Receive for Robust High-Performance Caching in CMPs. In: IEEE 15th International Symposium on High Performance Computer Architecture, HPCA (2009), doi:10.1109/HPCA.2009.4798236Google Scholar
  28. 28.
    Qureshi, M.K., Franceschini, M.M., Lastras-Montaño, L.A., Karidis, J.P.: Morphable memory system: a robust architecture for exploiting multi-level phase change memories. SIGARCH Comput. Archit. News 38, 153–162 (2010), doi:http://doi.acm.org/10.1145/1816038.1815981
  29. 29.
    Qureshi, M.K., Srinivasan, V., Rivers, J.A.: Scalable high performance main memory system using phase-change memory technology. In: Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA 2009, pp. 24–33. ACM, New York (2009), doi: http://doi.acm.org/10.1145/1555754.1555760
  30. 30.
    Rafique, N., Lim, W.T., Thottethodi, M.: Architectural support for operating system-driven CMP cache management. In: PACT 2006: Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, ACM (2006), doi: http://doi.acm.org/10.1145/1152154.1152160
  31. 31.
    Ramos, L.E., Gorbatov, E., Bianchini, R.: Page placement in hybrid memory systems. In: Proceedings of the International Conference on Supercomputing, ICS 2011, pp. 85–95. ACM, New York (2011), doi: http://doi.acm.org/10.1145/1995896.1995911
  32. 32.
    Rao, A.: Seamicro technology overview (2010)Google Scholar
  33. 33.
    Samih, A., Krishna, A., Solihin, Y.: Understanding the limits of capacity sharing in CMP Private Caches, in CMP-MSI (2009)Google Scholar
  34. 34.
    Samih, A., Krishna, A., Solihin, Y.: Evaluating Placement Policies for Managing Capacity Sharing in CMP Architectures with Private Caches. ACM Transactions on Architecture and Code Optimization (TACO) 8(3) (2011)Google Scholar
  35. 35.
    Soares, L., Stumm, M.: Flexsc: flexible system call scheduling with exception-less system calls. In: Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation, OSDI 2010, pp. 1–8. USENIX Association, Berkeley (2010), http://dl.acm.org/citation.cfm?id=1924943.1924946 Google Scholar
  36. 36.
    SPEC: SPECjbb2005, http://www.spec.org/jbb2005/
  37. 37.
    Standard Performance Evaluation Corporation (2006), http://www.specbench.org
  38. 38.
    Suh, G., Devadas, S., Rudolph, L.: A new memory monitoring scheme for memory-aware scheduling and partitioning. In: Proceedings of the Eighth International Symposium on High-Performance Computer Architecture, pp. 117–128 (2002), doi:10.1109/HPCA.2002.995703Google Scholar
  39. 39.
    Tam, D.K., Azimi, R., Soares, L.B., Stumm, M.: RapidMRC: Approximating L2 Miss Rate Curves on Commodity Systems for Online Optimizations. SIGPLAN Not. 44(3) (2009), doi: http://doi.acm.org/10.1145/1508284.1508259
  40. 40.
    Tanenbaum, A.S., Van Renesse, R.: Distributed operating systems. ACM Comput. Surv. 17, 419–470 (1985), doi: http://doi.acm.org/10.1145/6041.6074
  41. 41.
    Transaction Processing Performance Council: TPC-H 2.14.2 (2011), http://www.tpc.org/tpch/
  42. 42.
    vmware : experience game-changing virtual machine mobility, http://www.vmware.com/products/vmotion/overview.html (2011)
  43. 43.
    Wang, N., Liu, X., He, J., Han, J., Zhang, L., Xu, Z.: Collaborative memory pool in cluster system. In: International Conference on Parallel Processing, ICPP 2007, p. 17 (2007), doi:10.1109/ICPP.2007.25Google Scholar
  44. 44.
    Zhang, M., Asanovic, K.: Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors. In: ISCA 2005: Proceedings of the 32nd Annual International Symposium on Computer Architecture, IEEE Computer Society (2005), doi: http://dx.doi.org/10.1109/ISCA.2005.53

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Ahmad Samih
    • 1
  • Ren Wang
    • 2
  • Christian Maciocco
    • 2
  • Mazen Kharbutli
    • 3
  • Yan Solihin
    • 4
  1. 1.Intel Architecture GroupAustinUSA
  2. 2.Intel LabsHillsboroUSA
  3. 3.Jordan University of Science and TechnologyIrbidJordan
  4. 4.North Carolina State UniversityRaleighUSA

Personalised recommendations