Advertisement

Journal of Grid Computing

, Volume 16, Issue 1, pp 55–83 | Cite as

The Flowbster Cloud-Oriented Workflow System to Process Large Scientific Data Sets

Article
  • 94 Downloads

Abstract

The paper describes a new cloud-oriented workflow system called Flowbster. It was designed to create efficient data pipelines in clouds by which large compute-intensive data sets can efficiently be processed. The Flowbster workflow can be deployed in the target cloud as a virtual infrastructure through which the data to be processed can flow and meanwhile it flows through the workflow it is transformed as the business logic of the workflow defines it. Instead of using the enactor based workflow concept Flowbster applies the service choreography concept where the workflow nodes directly communicate with each other. Workflow nodes are able to recognize if they can be activated with a certain data set without the interaction of central control service like the enactor in service orchestration workflows. As a result Flowbster workflows implement a much more efficient data path through the workflow than service orchestration workflows. A Flowbster workflow works as a data pipeline enabling the exploitation of pipeline parallelism, workflow parallel branch parallelism and node scalability parallelism. The Flowbster workflow can be deployed in the target cloud on-demand based on the underlying Occopus cloud deployment and orchestrator tool. Occopus guarantees that the workflow can be deployed in several major types of IaaS clouds (OpenStack, OpenNebula, Amazon, CloudSigma). It takes care of not only deploying the nodes of the workflow but also to maintain their health by using various health-checking options. Flowbster also provides an intuitive graphical user interface for end-user scientists. This interface hides the low level cloud-oriented layers and hence users can concentrate on the business logic of their data processing applications without having detailed knowledge on the underlying cloud infrastructure.

Keywords

Workflow Data stream Scientific data Cloud orchestration Multi-cloud 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Notes

Acknowledgements

This work is partially funded by the European CloudiFacturing - Cloudification of Production Engineering for Predictive Digital Manufacturing project under grant No. 768892 (H2020-FoF-2017), and by the International Science & Technology Cooperation Program of China under grant No. 2015DFE12860. On behalf of the Flowbster project we thank for the usage of MTA Cloud (https://cloud.mta.hu/) that significantly helped us achieving the results published in this paper.

References

  1. 1.
    Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13(4), 457–494 (2015)CrossRefGoogle Scholar
  2. 2.
    Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: an overview of workflow system features and capabilities. Futur. Gener. Comput. Syst. 25(5), 528–540 (2009)CrossRefGoogle Scholar
  3. 3.
    Yu, J., Buyya, R.: A taxonomy of workflow management systems for grid computing. J. Grid Comput. 3(3–4), 171–200 (2005)CrossRefGoogle Scholar
  4. 4.
    Deelman, E., Vahi, K., Juve, G., Rynge, M., Callaghan, S., Maechling, P.J., Mayani, R., Chen, W., Silva, R.F.d., Livny, M., Wenger, K.: Pegasus: a workflow management system for science automation. Futur. Gener. Comput. Syst. (2014)Google Scholar
  5. 5.
    Fahringer, T., Prodan, R., Duan, R., Hofer, J., Nadeem, F., Nerieri, F., Podlipnig, S., Qin, J., Siddiqui, M., Truong, H.-L., Villazon, A., Wieczorek, M.: Askalon: a development and grid computing environment for scientific workflows. In: Taylor, I. J., Deelman, E., Gannon, D. B., Shields, M (eds.) Workflows for E- Science, pp. 450–471. Springer, London (2007)Google Scholar
  6. 6.
    Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: an extensible system for design and execution of scientific workflows. In: 16th International Conference on Scientific and Statistical Database Management (SSDBM), pp. 423–424 (2004)Google Scholar
  7. 7.
    Goecks, J., Nekrutenko, A., Taylor, J.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), 1–13 (2010)CrossRefGoogle Scholar
  8. 8.
    Oinn, T.M., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, R.M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)CrossRefGoogle Scholar
  9. 9.
    Zaha, J.M., Barros, A., Dumas, M., ter Hofstede, A.: A language for service behavior modeling. In: CoopIS, Montpellier, France (2006)Google Scholar
  10. 10.
    Kavantzas, N., Burdett, D., Ritzinger, G., Lafon, Y.: Web services choreography description language version 1.0, W3C Candidate Recommendation. Tech. Rep. (2005)Google Scholar
  11. 11.
    Terstyanszky, G., Kukla, T., Kiss, T., Kacsuk, P., Balasko, A., Farkas, Z.: Enabling scientific workflow sharing through coarse-grained interoperability. Futur. Gener. Comput. Syst. 37, 46–59 (2014)CrossRefGoogle Scholar
  12. 12.
    Kacsuk, P., Farkas, Z., Kozlovszky, M., Herman, G., Balasko, A., Karoczkai, K., Marton, I.: WS-PGRADE/GUSE generic DCI gateway framework for a large variety of user communities. J. Grid Comput. 10(4), 601–630 (2012)CrossRefGoogle Scholar
  13. 13.
    Hajnal, Á, Márton, I, Farkas, Z., Kacsuk, P.: Remote storage management in science gateways via data bridging. Concurr. Comput.: Pract. Exp. 27(16), 4398–4411 (2015)CrossRefGoogle Scholar
  14. 14.
    Kacsuk, P. (ed.): Science gateways for distributed computing infrastructures: development framework and exploitation by scientific user communities. Springer International Publishing. ISBN: 978-3-319-11267-1 (2014)Google Scholar
  15. 15.
    Occopus github repository: https://github.com/occopus
  16. 16.
    Flowbster github repository: https://github.com/occopus/flowbster
  17. 17.
    Trott, O., Olson, A.J.: Autodock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading. J. Comput. Chem. 31, 455–461 (2010)Google Scholar
  18. 18.
    Farkas, Z., Kacsuk, P., Kiss, T., Borsody, P., Hajnal, Á, Balaskó, Á, Karóczkai, K: Autodock gateway for molecular docking simulations in cloud systems. In: Terzo, O., Mossucca, L. (eds.) Cloud Computing with E-Science Applications. p. 300. ISBN:978-1-4665-9115-8, pp. 217–235. CRC Press - Taylor and Francis Group, Boca Raton (2015)Google Scholar
  19. 19.
    Kiss, T., Kacsuk, P., Lovas, R., et al.: WS-PGRADE/GUSE in European Projects. In: Kacsuk, P (ed.) Science Gateways for Distributed Computing Infrastructures: Development Framework and Exploitation by Scientific User Communities, pp. 235–254. Springer, Berlin (2014)Google Scholar
  20. 20.
    D’Agostino, D., Danovaro, E., Clematis, A., Roverelli, L., Zereik, G., Galizia, A.: From lesson learned to the refactoring of the DRIHM science gateway for hydro-meteorological research. J. Grid Comput. 14(4), 575–588 (2016)CrossRefGoogle Scholar
  21. 21.
    Gesing, S., Kruger, J., Grunzke, R., Herres-Pawslis, S., Hoffmann, A.: Using science gateways for bridging the differences between research infrastructures. J. Grid Comput. 14(4), 545–557 (2016)CrossRefGoogle Scholar
  22. 22.
  23. 23.
  24. 24.
    Occopus tutorial webpage: http://occopus.lpds.sztaki.hu/tutorials
  25. 25.
  26. 26.
    Vahi, K., Rynge, M., Juve, G., Mayani, R., Deelman, E.: Rethinking data management for big data scientific workflows. In: 2013 IEEE International Conference on Big Data. Silicon Valley.  https://doi.org/10.1109/BigData.2013.6691724  https://doi.org/10.1109/BigData.2013.6691724 (2013)
  27. 27.
    Farkas, Z., Kacsuk, P., Hajnal, Á: Enabling workflow-oriented science gateways to access multi-cloud systems. J. Grid Comput. 14(4), 619–640 (2016)CrossRefGoogle Scholar
  28. 28.
    Flanagan, K., et al.: Microbase2.0: a generic framework for computationally intensive bioinformatics workflows in the cloud. J. Integr. Bioinform. (JIB).  https://doi.org/10.2390/biecoll-jib-2012-212 (2012)
  29. 29.
    Emeakaroha, V.C., Maurer, M., Stern, P., Labaj, P.P., Brandic, I., Kreil, D.P.: Managing and optimizing bioinformatics workflows for data analysis in clouds. J. Grid Comput. 11(3), 407–428 (2013)CrossRefGoogle Scholar
  30. 30.
    Balis, B., Figiela, K., Malawski, M., Pawlik, M., Bubak, M.: A lightweight approach for deployment of scientific workflows in cloud infrastructures. In: Parallel Processing and Applied Mathematics, Volume 9573 of the series Lecture Notes in Computer Science, pp. 281–290 (2016)Google Scholar
  31. 31.
    Qasha, R., et al.: A framework for scientific workflow reproducibility in the cloud. In: 2016 IEEE 12th International Conference on e-Science (e-Science), pp. 81–90. IEEE.  https://doi.org/10.1109/eScience.2016.7870888 (2016)
  32. 32.
    Qasha, R., et al.: Dynamic deployment of scientific workflows in the cloud using container virtualization. In: 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom). IEEE, pp. 269–276.  https://doi.org/10.1109/CloudCom.2016.0052 (2016)
  33. 33.
    Kacsuk, P., Kecskemeti, G., Kertesz, A., Nemeth, Z., Visegradi, A., Gergely, M.: Infrastructure aware scientific workflows and their support by a Science Gateway. In: Proceedings of the 7th International Workshop on Science Gateways (IWSG 2015), pp. 22–27. Budapest (2015)Google Scholar
  34. 34.
  35. 35.
  36. 36.
  37. 37.
  38. 38.
  39. 39.
  40. 40.

Copyright information

© Springer Science+Business Media B.V., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute for Computer Science and ControlHungarian Academy of SciencesBudapestHungary
  2. 2.Centre for Parallel ComputingUniversity of WestminsterLondonUK

Personalised recommendations