Workflow Scheduling Techniques for Big Data Platforms

  • Mihaela-Catalina Nita
  • Mihaela Vasile
  • Florin PopEmail author
  • Valentin Cristea
Part of the Computer Communications and Networks book series (CCN)


Many applications in scientific fields, like physics, astronomy, biology, earth science, involve the process of transforming a set of data by applying iterative computation steps. From the computer science perspective these steps may be seen as a pool of tasks with data dependency. With the growth of the application complexity there will also be an increase in the number of workflows. Since we have a large variety of solutions for specific applications and platforms, a systematic analysis of existing solutions for scheduling models, methods, and algorithms used in workflow applications is needed. This chapter provides a global picture of the existing solutions providing support in making the optimal workflow scheduling choices.


Completion Time Schedule Algorithm Direct Acyclic Graph Service Level Agreement Business Process Execution Language 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



The research presented in this paper is supported by projects: DataWay: Real-time Data Processing Platform for Smart Cities: Making sense of Big Data—PN-II-RU-TE-2014-4-2731; MobiWay: Mobility Beyond Individualism: an Integrated Platform for Intelligent Transportation Systems of Tomorrow—PN-II-PT-PCCA-2013-4-0321; CyberWater grant of the Romanian National Authority for Scientific Research, CNDI-UEFISCDI, project number 47/2012; clueFarm: Information system based on cloud services accessible through mobile devices, to increase product quality and business development farms—PN-II-PT-PCCA-2013-4-0870.


  1. 1.
    Pop, F., Zhu, X., Yang, L.T.: Midhdc: Advanced topics on middleware services for heterogeneous distributed computing. part 1. Future Gener. Comput. Syst. 56, 734–735 (2016)CrossRefGoogle Scholar
  2. 2.
    Pop, F., Potop-Butucaru, M.: Armco: Advanced topics in resource management for ubiquitous cloud computing: An adaptive approach. Future Gener. Comput. Syst. 54, 79–81 (2016)CrossRefGoogle Scholar
  3. 3.
    Simion, B., Leordeanu, C., Pop, F., Cristea, V.: A hybrid algorithm for scheduling workflow applications in grid environments (icpdp). In: OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”, pp. 1331–1348. Springer (2007)Google Scholar
  4. 4.
    Vasile, M.A., Pop, F., Tutueanu, R.I., Cristea, V., Kołodziej, J.: Resource-aware hybrid scheduling algorithm in heterogeneous distributed computing. Future Gener. Comput. Syst. 51, 61–71 (2015)CrossRefGoogle Scholar
  5. 5.
    Lynch, C.: Big Data: How do your data grow? Nature 455(7209), 28–29 (2008)CrossRefGoogle Scholar
  6. 6.
    Pop, F., Iacono, M., Gribaudo, M., Kołodziej, J.: Advances in modelling and simulation for big-data applications (amsba). Concurrency Comput. Practice Experience 28(2), 291–293 (2016)CrossRefGoogle Scholar
  7. 7.
    Chen, M., Mao, S., Liu, Y.: Big Data: a survey. Mob. Networks Appl. 19(2), 171–209 (2014)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Erl, T., Khattak, W., Buhler, P.: Big Data Fundamentals: Concepts. Prentice Hall Press, Drivers & Techniques (2016)Google Scholar
  9. 9.
    Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: An overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)CrossRefGoogle Scholar
  10. 10.
    Muresan, O., Pop, F., Gorgan, D., Cristea, V.: Satellite image processing applications in mediogrid. In: 2006 Fifth International Symposium on Parallel and Distributed Computing, pp. 253–262. IEEE (2006)Google Scholar
  11. 11.
    Gorgan, D., Bacu, V., Rodila, D., Pop, F., Petcu, D.: Experiments on esipenvironment oriented satellite data processing platform. Earth Sci. Inf. 3(4), 297–308 (2010)CrossRefGoogle Scholar
  12. 12.
    Masdari, M., ValiKardan, S., Shahi, Z., Azar, S.I.: Towards workflow scheduling in cloud computing: a comprehensive analysis. J. Network Comput. Appl. 66, 64–82 (2016)CrossRefGoogle Scholar
  13. 13.
    Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M.: Workflows for e-Science: Scientific Workflows for Grids. Springer Publishing Company, Incorporated (2014)Google Scholar
  14. 14.
    Pop, F., Dobre, C., Cristea, V.: Performance analysis of grid dag scheduling algorithms using monarc simulation tool. In: 2008 International Symposium on Parallel and Distributed Computing, pp. 131–138. IEEE (2008)Google Scholar
  15. 15.
    Yu, J., Buyya, R., Ramamohanarao, K.: Workflow scheduling algorithms for grid computing. In: Metaheuristics for Scheduling in Distributed Computing Environments, pp. 173–214. Springer (2008)Google Scholar
  16. 16.
    Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the askalon grid environment. ACM SIGMOD Rec. 34(3), 56–62 (2005)CrossRefGoogle Scholar
  17. 17.
    Maheswaran, M., Ali, S., Siegal, H., Hensgen, D., Freund, R.F.: Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: Heterogeneous Computing Workshop, 1999.(HCW’99) Proceedings. Eighth, pp. 30–44. IEEE (1999)Google Scholar
  18. 18.
    Topcuoglu, H., Hariri, S., Wu, M.Y.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)CrossRefGoogle Scholar
  19. 19.
    Sakellariou, R., Zhao, H.: A hybrid heuristic for dag scheduling on heterogeneous systems. In: Proceedings of the 18th International Parallel and Distributed Processing Symposium, 2004, p. 111. IEEE (2004)Google Scholar
  20. 20.
    Bajaj, R., Agrawal, D.P.: Improving scheduling of tasks in a heterogeneous environment. IEEE Trans. Parallel Distrib. Syst. 15(2), 107–118 (2004)CrossRefGoogle Scholar
  21. 21.
    Golberg, D.E.: Genetic algorithms in search, optimization, and machine learning. Addion Wesley 1989, 102 (1989)Google Scholar
  22. 22.
    Hou, E.S., Ansari, N., Ren, H.: A genetic algorithm for multiprocessor scheduling. IEEE Trans. Parallel Distrib. Syst. 5(2), 113–120 (1994)CrossRefGoogle Scholar
  23. 23.
    YarKhan, A., Dongarra, J.J.: Experiments with scheduling using simulated annealing in a grid environment. In: International Workshop on Grid Computing, pp. 232–242. Springer (2002)Google Scholar
  24. 24.
    Menasce, D.A., Casalicchio, E.: A framework for resource allocation in grid computing. In: MASCOTS, pp. 259–267. Citeseer (2004)Google Scholar
  25. 25.
    Yu, J., Buyya, R., Tham, C.K.: Cost-based scheduling of scientific workflow applications on utility grids. In: First International Conference on e-Science and Grid Computing (e-Science’05), pp. 8–pp. IEEE (2005)Google Scholar
  26. 26.
    Sakellariou, R., Zhao, H., Tsiakkouri, E., Dikaiakos, M.D.: Scheduling workflows with budget constraints. In: Integrated Research in GRID Computing, pp. 189–202. Springer (2007)Google Scholar
  27. 27.
    Ramakrishnan, A., Singh, G., Zhao, H., Deelman, E., Sakellariou, R., Vahi, K., Blackburn, K., Meyers, D., Samidi, M.: Scheduling data-intensiveworkflows onto storage-constrained distributed resources. In: Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid’07), pp. 401–409. IEEE (2007)Google Scholar
  28. 28.
    Yu, Z., Shi, W.: A planner-guided scheduling strategy for multiple workflow applications. In: 2008 International Conference on Parallel Processing-Workshops, pp. 1–8. IEEE (2008)Google Scholar
  29. 29.
    Deelman, E., Singh, G., Su, M.H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., et al.: Pegasus: A framework for mapping complex scientific workflows onto distributed systems. Sci. Prog. 13(3), 219–237 (2005)Google Scholar
  30. 30.
    Xu, M., Cui, L., Wang, H., Bi, Y.: A multiple qos constrained scheduling strategy of multiple workflows for cloud computing. In: 2009 IEEE International Symposium on Parallel and Distributed Processing with Applications, pp. 629–634. IEEE (2009)Google Scholar
  31. 31.
    Durillo, J.J., Nae, V., Prodan, R.: Multi-objective energy-efficient workflow scheduling using list-based heuristics. Future Gener. Compu. Syst. 36, 221–236 (2014)CrossRefGoogle Scholar
  32. 32.
    Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., et al.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)CrossRefGoogle Scholar
  33. 33.
    Taylor, I., Shields, M., Wang, I., Rana, O.: Triana applications within grid computing and peer to peer environments. J. Grid Comput. 1(2), 199–217 (2003)CrossRefGoogle Scholar
  34. 34.
    Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: an extensible system for design and execution of scientific workflows. In: Proceedings of the 16th International Conference on Scientific and Statistical Database Management, 2004, pp. 423–424. IEEE (2004)Google Scholar
  35. 35.
    Fahringer, T., Prodan, R., Duan, R., Nerieri, F., Podlipnig, S., Qin, J., Siddiqui, M., Truong, H.L., Villazon, A., Wieczorek, M.: Askalon: A grid application development and computing environment. In: Proceedings of the 6th IEEE/ACM International Workshop on Grid Computing, pp. 122–131. IEEE Computer Society (2005)Google Scholar
  36. 36.
    von Laszewski, G., Hategan, M.: Java Cog Kit Karajan/Gridant Workflow Guide. Tech. rep, Technical Report, Argonne National Laboratory, Argonne, IL, USA (2005)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Mihaela-Catalina Nita
    • 1
  • Mihaela Vasile
    • 1
  • Florin Pop
    • 1
    Email author
  • Valentin Cristea
    • 1
  1. 1.Computer Science DepartmentUniversity Politehnica of BucharestBucharestRomania

Personalised recommendations