Towards Vertically Scalable Spark Applications

  • Luciano Baresi
  • Giovanni QuattrocchiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11339)


The dynamic provisioning of virtual machines (VMs) supported by many cloud computing infrastructures eases the scalability of software applications. Unfortunately, VMs are relatively slow to boot and public cloud providers do not allow users to vary their resources (vertical scalability) dynamically. To tackle both problems, a few years ago we presented a solution that combines the management of VMs with the use of containers specifically targeted to the efficient runtime management of the resources provisioned to Web applications. This paper borrows from this solution and addresses the problem of provisioning resources to big data, Spark applications at runtime. Spark does not allow for the runtime scalability of the resources associated with its executors, but resources must be provisioned statically. To tackle this problem, the paper describes a container-based version of Spark that supports the dynamic resizing of the memory and CPU cores associated with the different executors. The evaluation demonstrates the feasibility of the approach and identifies the trade-offs involved.


Containers Big data Spark Resource allocation 


  1. 1.
    Amazon EC2 Autoscaling.
  2. 2.
    Apache Hadoop (2017).
  3. 3.
    Al-Dhuraibi, Y., Paraiso, F., Djarallah, N., Merle, P.: Autonomic vertical elasticity of Docker containers with ELASTICDOCKER. In: IEEE 10th International Conference on Cloud Computing (CLOUD), pp. 472–479 (2017)Google Scholar
  4. 4.
    Baresi, L., Guinea, S., Leva, A., Quattrocchi, G.: A discrete-time feedback controller for containerized cloud applications. In: Proceedings of the 24th ACM International Symposium on Foundations of Software Engineering, pp. 217–228. ACM (2016)Google Scholar
  5. 5.
    Baresi, L., Guinea, S., Leva, A., Quattrocchi, G.: Fine-grained Dynamic Resource Allocation for Big-Data Applications. Technical report (2018).
  6. 6.
    Barna, C., Khazaei, H., Fokaefs, M., Litoiu, M.: Delivering elastic containerized cloud applications to enable DevOps. In: Proceedings of the 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, SEAMS 2017, pp. 65–75. IEEE Press (2017)Google Scholar
  7. 7.
    Dustdar, S., Guo, Y., Satzger, B., Truong, H.L.: Principles of elastic processes. IEEE Internet Comput. 15, 66–71 (2011)CrossRefGoogle Scholar
  8. 8.
    Hindman, B., et al.: A platform for fine-grained resource sharing in the data center. In: Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI 2011, pp. 295–308. USENIX (2011)Google Scholar
  9. 9.
    Lakew, E.B., Papadopoulos, A.V., Maggio, M., Klein, C., Elmroth, E.: KPI-agnostic control for fine-grained vertical elasticity. In: 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 589–598. IEEE (2017)Google Scholar
  10. 10.
    Liu, J., Shen, H., Narman, H.S.: CCRP: customized cooperative resource provisioning for high resource utilization in clouds. In: Proceedings of the 3rd IEEE International Conference on Big Data (Big Data), pp. 243–252 (2016)Google Scholar
  11. 11.
    Mao, M., Humphrey, M.: A Performance study on the VM startup time in the cloud. In: Proceedings of the IEEE 5th International Conference on Cloud Computing, pp. 423–430. IEEE (2012)Google Scholar
  12. 12.
    Merkel, D.: Docker: lightweight linux containers for consistent development and deployment. Linux J. (2014)Google Scholar
  13. 13.
    Nikravesh, A.Y., Ajila, S.A., Lung, C.H.: Towards an autonomic auto-scaling prediction system for cloud resource provisioning. In: Proceedings of the International Symposium on Software Engineering for Adaptive and Self-Managing Systems, pp. 35–45. IEEE Press (2015)Google Scholar
  14. 14.
    Rao, J., Bu, X., Xu, C.Z., Wang, K.: A distributed self-learning approach for elastic provisioning of virtualized cloud resources. In: IEEE 19th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), pp. 45–54. IEEE (2011)Google Scholar
  15. 15.
    Seracini, F., Menarini, M., Krueger, I., Baresi, L., Guinea, S., Quattrocchi, G.: A comprehensive resource management solution for web-based systems. In: Proceedings of the 11th International Conference on Autonomic Computing (2014)Google Scholar
  16. 16.
    Soltesz, S., Pötzl, H., Fiuczynski, M.E., Bavier, A., Peterson, L.: Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. In: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, vol. 41, pp. 275–287. ACM (2007)Google Scholar
  17. 17.
    Vavilapalli, V.K., et al.: Apache Hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th annual Symposium on Cloud Computing. ACM (2013)Google Scholar
  18. 18.
    Verma, A., Cherkasova, L., Kumar, V.S., Campbell, R.H.: Deadline-based workload management for MapReduce environments: pieces of the performance puzzle. In: NOMS, pp. 900–905. IEEE (2012)Google Scholar
  19. 19.
    Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: cluster computing with working sets. In: Proceedings of the 2nd Conference on Hot Topics in Cloud Computing, HotCloud 2010. USENIX (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Politecnico di Milano, Dipartimento di Elettronica, Informazione e BioingegneriaMilanItaly

Personalised recommendations