Abstract
Many Big Data analytics and IoT scenarios rely on fast and non-relational storage (NoSQL) to help processing massive amounts of data. In addition, managed runtimes (e.g. JVM) are now widely used to support the execution of these NoSQL storage solutions, particularly when dealing with Big Data key-value store-driven applications. The benefits of such runtimes can however be limited by automatic memory management, i.e., Garbage Collection (GC), which does not consider object locality, resulting in objects that point to each other being dispersed in memory. In the long run this may break the service-level of applications due to extra page faults and degradation of locality on system-level memory caches. We propose, LAG1 (short for Locality-Aware G1), an extension of modern heap layouts to promote locality between groups of related objects. This is done with no previous application profiling and in a way that is transparent to the programmer, without requiring changes to existing code. The heap layout and algorithmic extensions are implemented on top of the Garbage First (G1) garbage collector (the new by-default collector) of the HotSpot JVM. Using the YCSB benchmarking tool to benchmark HBase, a well-known and widely used Big Data application, we show negligible overhead in frequent operations such as the allocation of new objects, and significant improvements when accessing data, supported by higher hits in system-level memory structures.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The first level of the CPU data cache.
- 2.
The data Translation-Lookaside-Buffer.
- 3.
The mechanism used in HotSpot to create Stop-the-World pauses. Garbage collection cycles run inside a safepoint, during which all application threads are stopped.
- 4.
L1 is the 1st level of CPU cache: 32 KB in size and 64 B per line in modern models.
- 5.
L2 is the 2nd level of CPU cache: 256 KB in size and 64 B per line in modern models.
- 6.
- 7.
A page-walk consists on querying page-table entries, to see if the address the CPU is trying to load is present in physical memory.
References
http://hbase.apache.org/. Visited 16 Feb 2017
http://openjdk.java.net/. Visited 16 Feb 2017
http://www.oracle.com/technetwork/database/database-technologies/nosqldb/overview/index.html. Visited 16 Feb 2017
Bruno, R., Oliveira, L.P., Ferreira, P.: NG2C: pretenuring garbage collection with dynamic generations for hotspot big data applications. In: Proceedings of the 2017 ACM SIGPLAN International Symposium on Memory Management, ISMM 2017, NY, USA, pp. 2–13 (2017), http://doi.acm.org/10.1145/3092255.3092272
Bu, Y., Borkar, V., Xu, G., Carey, M.J.: A bloat-aware design for big data applications. In: Proceedings of the 2013 International Symposium on Memory Management, ISMM 2013, pp. 119–130. ACM (2013)
Chen, W.K., Bhansali, S., Chilimbi, T., Gao, X., Chuang, W.: Profile-guided proactive garbage collection for locality optimization. In: Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 332–340. ACM (2006)
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154. ACM (2010)
Detlefs, D., Flood, C., Heller, S., Printezis, T.: Garbage-first garbage collection. In: Proceedings of the 4th International Symposium on Memory Management, ISMM 2004, NY, USA, pp. 37–48 (2004), http://doi.acm.org/10.1145/1029873.1029879
Gidra, L., Thomas, G., Sopena, J., Shapiro, M.: A study of the scalability of stop-the-world garbage collectors on multicores. In: Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2013, pp. 229–240. ACM (2013)
Gidra, L., Thomas, G., Sopena, J., Shapiro, M., Nguyen, N.: Numagic: a garbage collector for big data on big NUMA machines. In: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 661–673. ACM (2015)
Huang, X., Blackburn, S.M., McKinley, K.S., Moss, J.E.B., Wang, Z., Cheng, P.: The garbage collection advantage. In: Proceedings of the 19th Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications - OOPSLA 2004, New York, USA, p. 69. ACM, New York (2004)
Ilham, A.A., Murakami, K.: Evaluation and optimization of java object ordering schemes. In: 2011 International Conference on Electrical Engineering and Informatics (ICEEI), pp. 1–6. IEEE (2011)
Jones, R., Hosking, A., Moss, J.E.B.: The Garbage Collection Handbook: The Art of Automatic Memory Management, 1st edn. Chapman & Hall/CRC (2011)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)
Maas, M., Asanović, K., Harris, T., Kubiatowicz, J.: Taurus: a holistic language runtime system for coordinating distributed managed-language applications. In: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2016, NY, USA, pp. 457–471. ACM, New York (2016)
Moon, D.A.: Garbage collection in a large lisp system. In: Proceedings of the 1984 ACM Symposium on LISP and Functional Programming, NY, USA, pp. 235–246. ACM, New York (1984)
Nguyen, K., Wang, K., Bu, Y., Fang, L., Hu, J., Xu, G.H.: FACADE: a compiler and runtime for (almost) object-bounded big data applications. In: ASPLOS, pp. 675–690. ACM (2015)
Pina, L., Veiga, L., Hicks, M.W.: Rubah: DSU for java on a stock JVM. In: Black, A.P., Millstein, T.D. (eds.) Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications, OOPSLA 2014, Part of SPLASH 2014, Portland, OR, USA, 20–24 October, 2014, pp. 103–119. ACM (2014), http://doi.acm.org/10.1145/2660193.2660220
Redmond, E., Wilson, J.R.: Seven Databases in Seven Weeks: A Guide to Modern Databases and the NoSQL Movement. Pragmatic Bookshelf (2012)
Silva, J.M., Simão, J., Veiga, L.: Ditto – deterministic execution replayability-as-a-service for Java VM on multiprocessors. In: Eyers, D., Schwan, K. (eds.) Middleware 2013. LNCS, vol. 8275, pp. 405–424. Springer, Heidelberg (2013). doi:10.1007/978-3-642-45065-5_21
Simão, J., Garrochinho, T., Veiga, L.: A checkpointing-enabled and resource-aware java virtual machine for efficient and robust e-science applications in grid environments. Concurrency Comput. Pract. Exp. 24(13), 1421–1442 (2012), https://doi.org/10.1002/cpe.1879
Singer, J., Brown, G., Watson, I., Cavazos, J.: Intelligent selection of application-specific garbage collectors. In: Proceedings of the 6th International Symposium on Memory Management, pp. 91–102. ACM (2007)
Soman, S., Krintz, C.: Application-specific garbage collection. J. Syst. Softw. 80, 1037–1056 (2007), http://dx.doi.org/10.1016/j.jss.2006.12.566
Tay, Y.C., Zong, X., He, X.: An equation-based heap sizing rule. Perform. Eval. 70(11), 948–964 (2013)
Ungar, D.: Generation scavenging: a non-disruptive high performance storage reclamation algorithm. ACM Sigplan Not. 19(5), 157–167 (1984)
Veiga, L., Ferreira, P.: Incremental replication for mobility support in OBIWAN. In: ICDCS, pp. 249–256 (2002), https://doi.org/10.1109/ICDCS.2002.1022262
Veiga, L., Ferreira, P.: Poliper: policies for mobile and pervasive environments. In: Kon, F., Costa, F.M., Wang, N., Cerqueira, R. (eds.) Proceedings of the 3rd Workshop on Adaptive and Reflective Middleware, ARM 2003, Toronto, Ontario, Canada, 19 October 2004, pp. 238–243. ACM (2004), http://doi.acm.org/10.1145/1028613.1028623
Wilson, P.R., Lam, M.S., Moher, T.G.: Effective static-graph reorganization to improve locality in garbage-collected systems. SIGPLAN Not. 26(6), 177–191 (1991)
Acknowledgements
This work was supported by national funds through Fundação para a Ciência e a Tecnologia with reference PTDC/EEI-SCR/6945/2014, and by the ERDF through COMPETE 2020 Programme, within project POCI-01-0145-FEDER-016883. This work was partially supported by Instituto Superior de Engenharia de Lisboa and Instituto Politécnico de Lisboa. This work was supported by national funds through Fundação para a Ciência e a Tecnologia (FCT) with reference UID/CEC/50021/2013.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Patrício, D., Bruno, R., Simão, J., Ferreira, P., Veiga, L. (2017). Locality-Aware GC Optimisations for Big Data Workloads. In: Panetto, H., et al. On the Move to Meaningful Internet Systems. OTM 2017 Conferences. OTM 2017. Lecture Notes in Computer Science(), vol 10574. Springer, Cham. https://doi.org/10.1007/978-3-319-69459-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-69459-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69458-0
Online ISBN: 978-3-319-69459-7
eBook Packages: Computer ScienceComputer Science (R0)