Advertisement

Big Data-Oriented PaaS Architecture with Disk-as-a-Resource Capability and Container-Based Virtualization

  • Jonatan Enes
  • Javier López Cacheiro
  • Roberto R. Expósito
  • Juan Touriño
Article
  • 5 Downloads

Abstract

With the increasing adoption of Big Data technologies as basic tools for the ongoing Digital Transformation, there is a high demand for data-intensive applications. In order to efficiently execute such applications, it is vital that cloud providers change the way hardware infrastructure resources are managed to improve their performance. However, the increasing use of virtualization technologies to achieve an efficient usage of infrastructure resources continuously widens the gap between applications and the underlying hardware, thus decreasing resource efficiency for the end user. Moreover, this scenario is especially troublesome for Big Data applications, as storage resources are one of the most heavily virtualized, thus imposing a significant overhead for large-scale data processing. This paper proposes a novel PaaS architecture specifically oriented for Big Data where the scheduler offers disks as resources alongside the more common CPU and memory resources, looking forward to provide a better storage solution for the user. Furthermore, virtualization overheads are reduced to the bare minimum by replacing heavy hypervisor-based technologies with operating-system-level virtualization based on light software containers. This architecture has been deployed on a Big Data infrastructure at the CESGA supercomputing center, used as a testbed to compare its performance with OpenStack, a popular private cloud platform. Results have shown significant performance improvements, reducing the execution time of representative Big Data workloads by up to 4.5×.

Keywords

Big data Platform as a Service (PaaS) Cloud computing Disk-as-a-resource scheduling Operating-system-level virtualization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

This work was supported by the Ministry of Economy, Industry and Competitiveness of Spain (Project TIN2016-75845-P, AEI/FEDER, EU), and by the FPU Program of the Ministry of Education (grant FPU15/03381).

References

  1. 1.
    Amazon Web Services (AWS): https://aws.amazon.com/. Last visited: June 2018
  2. 2.
    Axboe, J.: FIO tool github site. https://github.com/axboe/fio. Last visited: June 2018
  3. 3.
    Bakshi, K.: Considerations for Big Data: architecture and approach. In: IEEE Aerospace Conference, AeroConf’12, pp 1–7. Big Sky (2012)Google Scholar
  4. 4.
    Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., Warfield, A.: Xen and the art of virtualization. In: 19th ACM Symposium on Operating Systems Principles, SOSP’03, pp 164–177. Bolton Landing (2003)Google Scholar
  5. 5.
    Bernstein, D.: Containers and cloud: from LXC to Docker to Kubernetes. IEEE Cloud Comput. 1 (3), 81–84 (2014)CrossRefGoogle Scholar
  6. 6.
    Big Data Evaluator (BDEv): http://bdev.des.udc.es/. Last visited: June 2018
  7. 7.
    Bryk, P., Malawski, M., Juve, G., Deelman, E.: Storage-aware algorithms for scheduling of workflow ensembles in clouds. J. Grid Comput. 14(2), 359–378 (2016)CrossRefGoogle Scholar
  8. 8.
    Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Futur. Gener. Comput. Syst. 25(6), 599–616 (2009)CrossRefGoogle Scholar
  9. 9.
    Caballer, M., Zala, S., García, Á.L., Moltó, G., Fernández, P.O., Velten, M.: Orchestrating complex application architectures in heterogeneous clouds. J. Grid Comput. 16(1), 3–18 (2018)CrossRefGoogle Scholar
  10. 10.
    CESGA Supercomputing Center website: http://www.cesga.es/. Last visited: June 2018
  11. 11.
    Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: 1st ACM Symposium on Cloud Computing, SoCC’10, pp 143–154. Indianapolis (2010)Google Scholar
  12. 12.
    Darwin, P.B., Kozlowski, P.: AngularJS web application development. Packt Publishing (2013)Google Scholar
  13. 13.
    Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  14. 14.
    Dua, R., Raja, A.R., Kakadia, D.: Virtualization vs containerization to support PaaS. In: IEEE International Conference on Cloud Engineering, IC2E’14, pp 610–614. Boston (2014)Google Scholar
  15. 15.
    Expósito, R.R., Taboada, G.L., Ramos, S., González-Domínguez, J., Touriño, J., Doallo, R.: Analysis of I/O performance on an Amazon EC2 cluster compute and high I/O platform. J. Grid Comput. 11(4), 613–631 (2013)CrossRefGoogle Scholar
  16. 16.
    Ghoshal, D., Canon, R.S., Ramakrishnan, L.: I/O performance of virtualized cloud environments. In: 2nd International Workshop on Data Intensive Computing in the Clouds, DataCloud-SC’11, pp 71–80. Seattle (2011)Google Scholar
  17. 17.
    Google Compute Engine (GCE): https://cloud.google.com/compute/. Last visited: June 2018
  18. 18.
    Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R.H., Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: 8th USENIX Symposium on Networked Systems Design and Implementation, NSDI’11, pp 295–308. Boston (2011)Google Scholar
  19. 19.
    Jacobs, A.: The pathologies of Big Data. Commun. ACM 52(8), 36–44 (2009)CrossRefGoogle Scholar
  20. 20.
    Ji, C., Li, Y., Qiu, W., Awada, U., Li, K.: Big Data processing in cloud computing environments. In: 12th International Symposium on Pervasive Systems, Algorithms and Networks, I-SPAN’12, pp 17–23. San Marcos (2012)Google Scholar
  21. 21.
    Kaisler, S., Armour, F., Espinosa, J.A., Money, W.: Big Data: issues and challenges moving forward. In: 46th Hawaii International Conference on System Sciences, HICSS’13, pp 995–1004. Wailea (2013)Google Scholar
  22. 22.
    Katal, A., Wazid, M., Goudar, R.H.: Big Data: issues, challenges, tools and good practices. In: 6th International Conference on Contemporary Computing, IC3’13, pp 404–409. Noida (2013)Google Scholar
  23. 23.
    Kivity, A., Kamay, Y., Laor, D., Lublin, U., Liguori, A.: KVM: the Linux virtual machine monitor. In: Ottawa Linux Symposium, OLS’07, pp 225–230. Ottawa (2007)Google Scholar
  24. 24.
    Li, A., Yang, X., Kandula, S., Zhang, M.: CloudCmp: comparing public cloud providers. In: 10th ACM Internet Measurement Conference, IMC’10, pp 1–14. Melbourne (2010)Google Scholar
  25. 25.
    Mell, P., Grance, T.: The NIST definition of cloud computing. Commun. ACM 53(6), 46–51 (2010)CrossRefGoogle Scholar
  26. 26.
    Merkel, D.: Docker: lightweight Linux containers for consistent development and deployment. Linux J. (239):76–91 (2014)Google Scholar
  27. 27.
    Mizusawa, N., Nakazima, K., Yamaguchi, S.: Performance evaluation of file operations on OverlayFS. In: 5th International Symposium on Computing and Networking, CANDAR’17, pp 597–599. Aomori (2017)Google Scholar
  28. 28.
    OpenStack Installation Tutorial for Red Hat Enterprise Linux and CentOS: http://docs.openstack.org/newton/install-guide-rdo/. Last visited: June 2018
  29. 29.
    Peinl, R., Holzschuher, F., Pfitzer, F.: Docker cluster management for the cloud—survey results and own solution. J. Grid Comput. 14(2), 265–282 (2016)CrossRefGoogle Scholar
  30. 30.
    Rackspace website: https://www.rackspace.com. Last visited: June 2018
  31. 31.
    Ramon-Cortes, C., Serven, A., Ejarque, J., Lezzi, D., Badia, R.M.: Transparent orchestration of task-based parallel applications in containers platforms. J. Grid Comput. 16(1), 137–160 (2018)CrossRefGoogle Scholar
  32. 32.
    Ronacher, A.: Flask, a Python microframework. http://flask.pocoo.org/. Last visited: June 2018
  33. 33.
    Sefraoui, O., Aissaoui, M., Eleuldj, M.: OpenStack: toward an open-source solution for cloud computing. Int. J. Comput. Appl. 55(3), 38–42 (2012)Google Scholar
  34. 34.
    Shafer, J.: I/O virtualization bottlenecks in cloud computing today. In: 2nd Workshop on I/O Virtualization, WIOV’10, pp 5:1–5:7. Pittsburgh (2010)Google Scholar
  35. 35.
    Shafer, J., Rixner, S., Cox, A.L.: The Hadoop distributed filesystem: balancing portability and performance. In: IEEE International Symposium on Performance Analysis of Systems & Software, ISPASS’10, pp 122–133. White Plains (2010)Google Scholar
  36. 36.
    Shamsi, J., Khojaye, M.A., Qasmi, M.A.: Data-intensive cloud computing: requirements, expectations, challenges, and solutions. J. Grid Comput. 11(2), 281–310 (2013)CrossRefGoogle Scholar
  37. 37.
    Shue, D., Freedman, M.J., Shaikh, A.: Performance isolation and fairness for multi-tenant cloud storage. In: 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI’12, pp 349–362. Hollywood (2012)Google Scholar
  38. 38.
    Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop Distributed File System. In: IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST’10, pp 1–10. Incline Village (2010)Google Scholar
  39. 39.
    Soltesz, S., Pötzl, H., Fiuczynski, M.E., Bavier, A., Peterson, L.: Container-based operating system virtualization: a scalable, high-performance alternative to hypervisors. In: 2nd ACM European Conference on Computer Systems, EuroSys’07, pp 275–287. Lisbon (2007)Google Scholar
  40. 40.
    Tihfon, G.M., Park, S., Kim, J., Kim, Y.M.: An efficient multi-task PaaS cloud infrastructure based on Docker and AWS ECS for application deployment. Cluster Comput. 19(3), 1585–1597 (2016)CrossRefGoogle Scholar
  41. 41.
    Varadarajan, V., Kooburat, T., Farley, B., Ristenpart, T., Swift, M.M.: Resource-freeing attacks: improve your cloud performance (at your neighbor’s expense). In: 19th ACM Conference on Computer and Communications Security, CCS’12, pp 281–292. Raleigh (2012)Google Scholar
  42. 42.
    Vavilapalli, V.K., et al.: Apache Hadoop YARN: Yet Another Resource Negotiator. In: 4th Annual Symposium on Cloud Computing, SOCC’13, pp 5:1–5:16. Santa Clara (2013)Google Scholar
  43. 43.
    Veiga, J., Enes, J., Expósito, R.R., Touriño, J.: BDEv 3.0: Energy efficiency and microarchitectural characterization of big data processing frameworks. Futur. Gener. Comput. Syst. 86, 565–581 (2018)CrossRefGoogle Scholar
  44. 44.
    Wu, J., Ping, L., Ge, X., Wang, Y., Fu, J.: Cloud storage as the infrastructure of cloud computing. In: International Conference on Intelligent Computing and Cognitive Informatics, ICICCI’10, pp 380–383. Kuala Lumpur (2010)Google Scholar
  45. 45.
    Yoo, A.B., Jette, M.A., Grondona, M.: SLURM: Simple Linux Utility for Resource Management. In: 9th Workshop on Job Scheduling Strategies for Parallel Processing, JSSPP’03, pp 44–60. Seattle (2003)Google Scholar
  46. 46.
    Younge, A.J., Henschel, R., Brown, J.T., Von Laszewski, G., Qiu, J., Fox, G.C.: Analysis of virtualization technologies for high performance computing environments. In: 4th IEEE International Conference on Cloud Computing, CLOUD’11, pp 9–16. Washington DC (2011)Google Scholar
  47. 47.
    Zaharia, M., et al.: Apache Spark: a unified engine for Big Data processing. Commun. ACM 59 (11), 56–65 (2016)CrossRefGoogle Scholar
  48. 48.
    Zeng, W., Zhao, Y., Ou, K., Song, W.: Research on cloud storage architecture and key technologies. In: 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human, ICIS’09, pp 1044–1048. Seoul (2009)Google Scholar

Copyright information

© Springer Nature B.V. 2018

Authors and Affiliations

  1. 1.Computer Architecture GroupUniversidade da CoruñaA CoruñaSpain
  2. 2.Fundación Centro de Supercomputación de Galicia (CESGA)Santiago de CompostelaSpain

Personalised recommendations