EarnCache: Self-adaptive Incremental Caching for Big Data Applications

  • Yifeng Luo
  • Junshi Guo
  • Shuigeng ZhouEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10988)


Memory caching plays a crucial role in satisfying the requirements for (quasi-)real-time processing of exploding data on big-data clusters. As big data clusters are usually shared by multiple computing frameworks, applications or end users, there exists intense competition for memory cache resources, especially on small clusters that are supposed to process comparably big datasets as large clusters do, yet with tightly limited resource budgets. Applying existing on-demand caching strategies on such shared clusters inevitably results in frequent cache thrashing when the conflicts of simultaneous cache resource demands are not mediated, which will deteriorate the overall cluster efficiency.

In this paper, we propose a novel self-adaptive incremental big data caching mechanism, called EarnCache, to improve the cache efficiency for shared big data clusters, especially for small clusters where cache thrashing may occur frequently. EarnCache self-adaptively adjusts resource allocation strategy according to the condition of cache resource competition: turning to incremental caching to depress competition when resource is in deficit, and returning to traditional on-demand caching to expedite data caching-in when resource is in surplus. Extensive experimental evaluation shows that the elasticity of EarnCache enhances the cache efficiency on shared big data clusters, and thus improves resource utilization.


Big data Cache management Self-adaptive and Incremental caching 



This work was supported by National Natural Science Foundation of China (NSFC) (No. U1636205), and the Science and Technology Innovation Action Program of Science and Technology Commission of Shanghai Municipality (STCSM) (No. 17511105204).


  1. 1.
    Zhang, J., Wu, G., Hu, X., et al.: A distributed cache for Hadoop distributed file system in real-time cloud services. In: Proceedings of GRID, pp. 12–21 (2012)Google Scholar
  2. 2.
    Ousterhout, J., Agrawal, P., Erickson, D., et al.: The case for RAMCloud. Commun. ACM 54(7), 121–130 (2011)CrossRefGoogle Scholar
  3. 3.
    Dean, J., Ghemawat, S., et al.: MapReduce: simplified data processing on large cluster. In: Proceedings of OSDI, pp. 137–150 (2004)Google Scholar
  4. 4.
    Li, H., Ghodsi, A., Zaharia, M., et al.: Tachyon: reliable, memory speed storage for cluster computing frameworks. In: Proceedings of SOCC, pp. 6:1–6:15 (2014)Google Scholar
  5. 5.
    Li, Y., Feng, D., Shi, Z.: Enhancing both fairness and performance using rate-aware dynamic storage cache partitioning. In: Proceedings of DISCS, pp. 31–36 (2013)Google Scholar
  6. 6.
    Shvachko, K., Kuang, H., Radia, S., et al.: The Hadoop distributed file system. In: Proceedings of MSST, pp. 121–134 (2010)Google Scholar
  7. 7.
    Ananthanarayanan, G., Ghodsi, A., Warfield, A., et al.: PACMan: coordinated memory caching for parallel jobs. In: Proceedings of NSDI, pp. 267–280 (2012)Google Scholar
  8. 8.
    Pu, Q., Li, H., Zaharia, M., et al.: FairRide: near-optimal, fair cache sharing. In: Proceedings of NSDI, pp. 393–406 (2016)Google Scholar
  9. 9.
    Zaharia, M., Chowdhury, M., Das, T., et al.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of NSDI, pp. 15–28 (2012)Google Scholar
  10. 10.
    Zhang, S., Han, J., Liu, Z., et al.: Accelerating MapReduce with distributed memory cache. In: Proceedings of ICPADS, pp. 472–478 (2009)Google Scholar
  11. 11.
    Luo, Y., Luo, S., Guan, J., et al.: A RAMCloud storage system based on HDFS: architecture, implementation and evaluation. J. Syst. Softw. 86(3), 744–750 (2013)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Ma, Q., Steenkiste, P., Zhang, H.: Routing high-bandwidth traffic in max-min fair share networks. In: Proceedings of SIGCOMM, pp. 206–217 (1996)CrossRefGoogle Scholar
  13. 13.
    Cao, Z., Zegura, W.: Utility max-min: an application-oriented bandwidth allocation scheme. In: Proceedings of INFOCOM, pp. 793–801 (1999)Google Scholar
  14. 14.
    Luo, Y., Guo, J., Zhu, J., Guan, J., Zhou, S.: Towards efficiently supporting database as a service with QoS guarantees. J. Syst. Softw. 139, 51–63 (2018)CrossRefGoogle Scholar
  15. 15.
    Luo, Y., Guo, J., Zhu, J., Guan, J., Zhou, S.: Supporting cost-efficient multi-tenant database services with service level objectives (SLOs). In: Candan, S., Chen, L., Pedersen, T.B., Chang, L., Hua, W. (eds.) DASFAA 2017. LNCS, vol. 10177, pp. 592–606. Springer, Cham (2017). Scholar
  16. 16.
    Luo, Y., Shi, J., Zhou, S.: JeCache: just-enough data caching with just-in-time prefetching for big data applications. In: Proceedings of ICDCS, pp. 2405–2410 (2017)Google Scholar
  17. 17.
    Tang, S., Lee, B., He, B., et al.: Long-term resource fairness: towards economic fairness on pay-as-you-use computing systems. In: Proceedings of ICS, pp. 251–260 (2014)Google Scholar
  18. 18.
    Ghodsi, A., Zaharia, M., Hindman, B., et al.: Dominant resource fairness: fair allocation of multiple resource types. In: Proceedings of NSDI, pp. 323–336 (2011)Google Scholar
  19. 19.
  20. 20.
  21. 21.
  22. 22.

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Computer Science, and Shanghai Key Lab of Intelligent Information ProcessingFudan UniversityShanghaiChina
  2. 2.School of Data Science and EngineeringEast China Normal UniversityShanghaiChina

Personalised recommendations