Skip to main content

Big Data Scientific Workflows in the Cloud: Challenges and Future Prospects

  • Chapter
  • First Online:
Book cover Cloud Computing for Geospatial Big Data Analytics

Part of the book series: Studies in Big Data ((SBD,volume 49))

Abstract

The concept of workflows was implemented to mitigate the complexities involved in tasks related to scientific computing and business analytics. With time, they have found applications in many diverse fields and domains. Handling big data has given rise to many other issues like growing computing complexity, increasing data size, provisioning of resources and the need for such systems to enable working together of heterogeneous systems. As a result, traditional systems are deemed obsolete for this purpose. To meet the variable resource requirements, cloud has emerged as an ostensible solution. Execution and deployment of big data scientific workflows in the cloud is an area that requires research attention before a synergistic model for the same can be presented. This paper identifies open research problems associated with this domain, giving insights on specific issues like workflow scheduling and execution and deployment of big data scientific workflows in a multi-site cloud environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gao, S., Li, L., Goodchild, M.F.: A scalable geoprocessing workflow for big geo-data analysis and optimized geospatial feature conflation based on Hadoop. In: CyberGIS All Hands Meeting (CyberGIS AHM’13) (2013)

    Google Scholar 

  2. Hoffa, C., Mehta, G., Freeman, T., Deelman, E., Keahey, K., Berriman, B., Good, J.: On the use of cloud computing for scientific workflows. In: IEEE Fourth International Conference on eScience, 2008, eScience’08, pp. 640–645. IEEE (2008)

    Google Scholar 

  3. Kashyap, H., Ahmed, H.A., Hoque, N., Roy, S., Bhattacharyya, D.K.: Big data analytics in bioinformatics: a machine learning perspective. arXiv:1506.05101 (2015)

  4. IDC. EMC Digital Universe with Research & Analysis. EMC.com. https://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm. Accessed 12 March 2018

  5. Das, H., Naik, B., Behera, H.S.: Classification of diabetes mellitus disease (DMD): a data mining (DM) approach. In: Progress in Computing, Analytics and Networking, pp. 539–549. Springer, Singapore (2018)

    Google Scholar 

  6. Sahani, R., Rout, C., Badajena, J.C., Jena, A.K., Das, H.: Classification of intrusion detection using data mining techniques. In: Progress in Computing, Analytics and Networking, pp. 753–764. Springer, Singapore (2018)

    Google Scholar 

  7. Mishra, B.S.P., Das, H., Dehuri, S., Jagadev, A.K.: Cloud Computing for Optimization: Foundations, Applications, and Challenges, vol. 39. Springer (2018)

    Google Scholar 

  8. Pattnaik, P.K., Rautaray, S.S., Das, H., Nayak, J. (eds.): Progress in Computing, Analytics and Networking: Proceedings of ICCAN 2017, vol. 710. Springer (2018)

    Google Scholar 

  9. Khan, S., Shakil, K.A., Alam, M.: Cloud-based big data analytics—a survey of current research and future directions. In: Big Data Analytics, pp. 595–604. Springer, Singapore (2018)

    Google Scholar 

  10. Panigrahi, C.R., Tiwary, M., Pati, B., Das, H.: Big data and cyber foraging: future scope and challenges. In: Techniques and Environments for Big Data Analysis, pp. 75–100. Springer, Cham (2016)

    Google Scholar 

  11. Barik, R.K., Dubey, H., Misra, C., Borthakur, D., Constant, N., Sasane, S.A., Mankodiya, K.: Fog assisted cloud computing in era of Big Data and Internet-of-Things: systems, architectures, and applications. In: Cloud Computing for Optimization: Foundations, Applications, and Challenges, pp. 367–394. Springer, Cham (2018)

    Google Scholar 

  12. Barik, R.K., Tripathi, A., Dubey, H., Lenka, R.K., Pratik, T., Sharma, S., Das, H.: Mistgis: optimizing geospatial DATA analysis using mist computing. In: Progress in Computing, Analytics and Networking, pp. 733–742. Springer, Singapore (2018)

    Google Scholar 

  13. Reddy, K.H.K., Das, H., Roy, D.S.: A Data Aware Scheme for Scheduling Big-Data Applications with SAVANNA Hadoop. Futures of Network. CRC Press (2017)

    Google Scholar 

  14. Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13(4), 457–493 (2015)

    Article  Google Scholar 

  15. Li, X., Song, J., Huang, B.: A scientific workflow management system architecture and its scheduling based on cloud service platform for manufacturing big data analytics. Int. J. Adv. Manuf. Technol. 84(1–4), 119–131 (2016)

    Article  Google Scholar 

  16. Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-Science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)

    Article  Google Scholar 

  17. John, S., Mohamed, M.: A network performance aware QoS based workflow scheduling for grid services. Int. Arab J. Inf. Technol. (2016)

    Google Scholar 

  18. Bux, M., Leser, U.: Parallelization in scientific workflow management systems. arXiv:1303.7195 (2013)

  19. Chen, W., Deelman, E.: Partitioning and scheduling workflows across multiple sites with storage constraints. In: International Conference on Parallel Processing and Applied Mathematics, pp. 11–20. Springer, Berlin, Heidelberg (2011)

    Google Scholar 

  20. Görlach, K., Sonntag, M., Karastoyanova, D., Leymann, F., Reiter, M.: Conventional workflow technology for scientific simulation. In: Guide to e-Science, pp. 323–352. Springer, London (2011)

    Google Scholar 

  21. Zhao, Y., Hategan, M., Clifford, B., Foster, I., Laszewski, G.V., Nefedova, V., Raicu, I., Stef-Praun, T., Wilde, M.: Swift: fast, reliable, loosely coupled parallel computation. In: 2007 IEEE Congress on Services, pp. 199–206. IEEE (2007)

    Google Scholar 

  22. Deelman, E., Vahi, K., Juve, G., Rynge, M., Callaghan, S., Maechling, P.J., Mayani, R., et al.: Pegasus, a workflow management system for science automation. Future Gener. Comput. Syst. 46, 17–35 (2015)

    Article  Google Scholar 

  23. Missier, P., Soiland-Reyes, S., Owen, S., Tan, W., Nenadic, A., Dunlop, I., Williams, A., Oinn, T., Goble, C.: Taverna, reloaded. In: International Conference on Scientific and Statistical Database Management, pp. 471–481. Springer, Berlin, Heidelberg (2010)

    Google Scholar 

  24. Altintas, I., Berkley, C., Jaeger, E., Jones, M., Ludascher, B., Mock, S.: Kepler: an extensible system for design and execution of scientific workflows. In: 16th International Conference on Scientific and Statistical Database Management, 2004. Proceedings, pp. 423–424. IEEE (2004)

    Google Scholar 

  25. Goecks, J., Nekrutenko, A., Taylor, J.: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 11(8), R86 (2010)

    Article  Google Scholar 

  26. Ogasawara, E., Dias, J., Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: An algebraic approach for data-centric scientific workflows. Proc. VLDB Endow. 4(12), 1328–1339 (2011)

    Google Scholar 

  27. Fahringer, T., Prodan, R., Duan, R., Hofer, J., Nadeem, F., Nerieri, F., Podlipnig, S., et al.: Askalon: a development and grid computing environment for scientific workflows. In: Workflows for e-Science, pp. 450–471. Springer, London (2007)

    Google Scholar 

  28. Curcin, V., Ghanem, M.: Scientific workflow systems-can one size fit all? In: Cairo International Biomedical Engineering Conference, 2008, CIBEC 2008, pp. 1–9. IEEE (2008)

    Google Scholar 

  29. Kacsuk, P., Farkas, Z., Kozlovszky, M., Hermann, G., Balasko, A., Karoczkai, K., Marton, I.: WS-PGRADE/gUSE generic DCI gateway framework for a large variety of user communities. J. Grid Comput. 10(4), 601–630 (2012)

    Article  Google Scholar 

  30. Yildiz, U., Guabtni, A., Ngu, A.H.: Business versus scientific workflows: a comparative study. In: 2009 World Conference on In Services-I, pp. 340–343. IEEE (2009)

    Google Scholar 

  31. Zhang, Q., Cheng, L., Boutaba, R.: Cloud computing: state-of-the-art and research challenges. J. Internet Serv. Appl. 1(1), 7–18 (2010)

    Article  Google Scholar 

  32. Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance collection support in the Kepler scientific workflow system. In: International Provenance and Annotation Workshop, pp. 118–132. Springer, Berlin, Heidelberg (2006)

    Google Scholar 

  33. Ganga, K., Karthik, S.: A fault tolerant approach in scientific workflow systems based on cloud computing. In: 2013 International Conference on Pattern Recognition, Informatics and Mobile Engineering (PRIME), pp. 387–390. IEEE (2013)

    Google Scholar 

  34. Ostermann, S., Prodan, R., Fahringer, T.: Extending grids with cloud resource management for scientific computing. In: 10th IEEE/ACM International Conference on Grid Computing, 2009, pp. 42–49. IEEE (2009)

    Google Scholar 

  35. Sarkhel, P., Das, H., Vashishtha, L.K.: Task-scheduling algorithms in cloud environment. In: Computational Intelligence in Data Mining, pp. 553–562. Springer, Singapore (2017)

    Google Scholar 

  36. De AR Gonçalves, J.C., de Oliveira, D., Ocaña, K.A., Ogasawara, E., Mattoso, M.: Using domain-specific data to enhance scientific workflow steering queries. In: International Provenance and Annotation Workshop, pp. 152–167. Springer, Berlin, Heidelberg (2012)

    Google Scholar 

  37. Yu, J., Buyya, R.: A taxonomy of workflow management systems for grid computing. J. Grid Comput. 3(3–4), 171–200 (2005)

    Article  Google Scholar 

  38. Ludäscher, B., Altintas, I., Berkley, C., Higgins, D., Jaeger, E., Jones, M., Lee, E.A., Tao, J., Zhao, Y.: Scientific workflow management and the Kepler system. Concurr. Comput. Pract. Exp. 18(10), 1039–1065 (2006)

    Article  Google Scholar 

  39. Wang, J., Altintas, I.: Early cloud experiences with the Kepler scientific workflow system. Procedia Comput. Sci. 9, 1630–1634 (2012)

    Article  Google Scholar 

  40. Kim, J., Deelman, E., Gil, Y., Mehta, G., Ratnakar, V.: Provenance trails in the wings/Pegasus system. Concurr. Comput. Pract. Exp. 20(5), 587–597 (2008)

    Article  Google Scholar 

  41. Mangala, N., Ch, J., Shashi, S., Subrata, C.: Galaxy workflow integration on Garuda grid. In: IEEE 21st International Workshop on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), pp. 194–196 (2012)

    Google Scholar 

  42. Mattoso, M., Werner, C., Travassos, G.H., Braganholo, V., Ogasawara, E., Oliveira, D., Cruz, S., Martinho, W., Murta, L.: Towards supporting the life cycle of large scale scientific experiments. Int. J. Bus. Process Integr. Manag. 5(1), 79–92 (2010)

    Article  Google Scholar 

  43. Terstyanszky, G., Kukla, T., Kiss, T., Kacsuk, P., Balaskó, Á., Farkas, Z.: Enabling scientific workflow sharing through coarse-grained interoperability. Future Gener. Comput. Syst. 37, 46–59 (2014)

    Article  Google Scholar 

  44. Kacsuk, P.: Science Gateways for Distributed Computing Infrastructures. Springer International Publishing (2014). https://doi.org/10.1007/978-3-319-11268-8_10

  45. Bergmann, R., Gil, Y.: Retrieval of semantic workflows with knowledge intensive similarity measures. In: International Conference on Case-Based Reasoning, pp. 17–31. Springer, Berlin, Heidelberg (2011)

    Google Scholar 

  46. Liu, B., Sotomayor, B., Madduri, R., Chard, K., Foster, I.: Deploying bioinformatics workflows on clouds with galaxy and Globus provision. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 1087–1095 (2012)

    Google Scholar 

  47. Buyya, R., Yeo, C.S., Venugopal, S.: Market-oriented cloud computing: vision, hype, and reality for delivering it services as computing utilities. In: 10th IEEE International Conference on High Performance Computing and Communications, pp. 5–13 (2008)

    Google Scholar 

  48. Vahi, K., Harvey, I., Samak, T., Gunter, D., Evans, K., Rogers, D., Taylor, I., Goode, M., Silva, F., Al-Shkarchi, E., Mehta, G.: A general approach to real-time workflow monitoring. In: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion, pp. 108–118 (2012)

    Google Scholar 

  49. Yuan, D., Cui, L., Liu, X.: Cloud data management for scientific workflows: research issues, methodologies, and state-of-the-art. In: 2014 10th International Conference on Semantics, Knowledge and Grids (SKG), pp. 21–28 (2014)

    Google Scholar 

  50. Oinn, T., Li, P., Kell, D.B., Goble, C., Goderis, A., Greenwood, M., Hull, D., Stevens, R., Turi, D., Zhao, J.: Taverna/my Grid: aligning a workflow system with the life sciences community. In: Workflows for e-Science, pp. 300–319. Springer, London (2007)

    Google Scholar 

  51. Kozlovszky, M., Karóczkai, K., Márton, I., Kacsuk, P., Gottdank, T.: DCI bridge: executing Ws-pgrade workflows in distributed computing infrastructures. In: Science Gateways for Distributed Computing Infrastructures, pp. 51–67. Springer, Cham (2014)

    Google Scholar 

  52. Litzkow, M.J., Livny, M., Mutka, M.W.: Condor—a hunter of idle workstations. In: Distributed Computing Systems, 8th International Conference on Semantics, Knowledge and Grids (SKG), pp. 104–111 (1988)

    Google Scholar 

  53. Brandic, I., Dustdar, S.: Grid vs Cloud—a technology comparison. IT-Inf. Technol. Methoden und innovative Anwendungen der Informatik und Informationstechnik 53(4), 173–179 (2011)

    Google Scholar 

  54. Ramakrishnan, A., Singh, G., Zhao, H., Deelman, E., Sakellariou, R., Vahi, K., Blackburn, K., Meyers, D., Samidi, M.: Scheduling data-intensive workflows onto storage-constrained distributed resources. In: Seventh IEEE International Symposium on Cluster Computing and the Grid, 2007, pp. 401–409. IEEE (2007)

    Google Scholar 

  55. Keahey, K., Freeman, T.: Contextualization: providing one-click virtual clusters. In: IEEE Fourth International Conference on eScience, 2008, eScience’08, pp. 301–308. IEEE (2008)

    Google Scholar 

  56. Vöckler, J.S., Juve, G., Deelman, E., Rynge, M., Berriman, B.: Experiences using cloud computing for a scientific workflow application. In: Proceedings of the 2nd International Workshop on Scientific Cloud Computing, pp. 15–24. ACM (2011)

    Google Scholar 

  57. Talia, D.: Clouds for Scalable Big Data Analytics. IEEE Computer Society. http://scholar.google.co.in/scholar_url?url=http://xa.yimg.com/kq/groups/16253916/1476905727/name/06515548.pdf&hl=en&sa=X&scisig=AAGBfm12aY-Nbu37oZYRuEqeqsdslzKfBQ&nossl=1&oi=scholarr&ved=0CCYQgAMoADAAahUKEwi3k4Hymv7GAhUHUKYKHdToBCM. Accessed 16 March 2018

    Google Scholar 

  58. Lin, C., Lu, S., Fei, X., Chebotko, A., Pai, D., Lai, D., Fotouhi, F., Hua, J.: A reference architecture for scientific workflow management systems and the VIEW SOA solution. IEEE Trans. Serv. Comput. 2(1), 79–92 (2009)

    Article  Google Scholar 

  59. Zhao, Y., Li, Y., Lu, S., Raicu, I., Lin, C.: Devising a cloud scientific workflow platform for big data. In: 2014 IEEE World Congress on Services (SERVICES), pp. 393–401. IEEE (2014)

    Google Scholar 

  60. Juve, G., Deelman, E.: Scientific workflows in the cloud. In: Grids, Clouds and Virtualization, pp. 71–91. Springer, London (2011)

    Google Scholar 

  61. Bell, G., Hey, T., Szalay, A.: Beyond the data deluge. Science 323(5919), 1297–1298 (2009)

    Article  Google Scholar 

  62. Das, H., Jena, A.K., Badajena, J.C., Pradhan, C., Barik, R.K.: Resource allocation in cooperative cloud environments. In: Progress in Computing, Analytics and Networking, pp. 825–841. Springer, Singapore (2018)

    Google Scholar 

  63. Malawski, M., Juve, G., Deelman, E., Nabrzyski, J.: Algorithms for cost-and deadline-constrained provisioning for scientific workflow ensembles in IaaS clouds. Future Gener. Comput. Syst. 48, 1–18 (2015)

    Article  Google Scholar 

  64. Kwok, Y.K., Ahmad, I.: Dynamic critical-path scheduling: an effective technique for allocating task graphs to multiprocessors. IEEE Trans. Parallel Distrib. Syst. 7(5), 506–521 (1996)

    Article  Google Scholar 

  65. Juve, G., Deelman, E.: Wrangler: virtual cluster provisioning for the cloud. In: Proceedings of the 20th International Symposium on High Performance Distributed Computing, pp. 277–278. ACM (2011)

    Google Scholar 

  66. Barolli, L., Chen, X., Xhafa, F.: Advances on cloud services and cloud computing. Concurr. Comput. Pract. Exp. 27(8), 1985–1987 (2015)

    Article  Google Scholar 

  67. Ali, S.A., Alam, M.: A relative study of task scheduling algorithms in cloud computing environment. In: 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I), pp. 105–111. IEEE (2016)

    Google Scholar 

  68. Rodriguez, M.A., Buyya, R.: Deadline based resource provisioning and scheduling algorithm for scientific workflows on clouds. IEEE Trans. Cloud Comput. 2(2), 222–235 (2014)

    Article  Google Scholar 

  69. Bux, M., Brandt, J., Witt, C., Dowling, J., Leser, U.: Hi-WAY: execution of scientific workflows on Hadoop YARN. In: Proceedings of the 20th International Conference on Extending Database Technology (EDBT), Venice, Italy (2017)

    Google Scholar 

  70. Nayak, J., Naik, B., Jena, A.K., Barik, R.K., Das, H.: Nature inspired optimizations in cloud computing: applications and challenges. In: Cloud Computing for Optimization: Foundations, Applications, and Challenges, pp. 1–26. Springer, Cham (2018)

    Google Scholar 

  71. Ritchie, G., Levine, J.: A fast, effective local search for scheduling independent jobs in heterogeneous computing environments (2003)

    Google Scholar 

  72. Falzon, G., Li, M.: Enhancing genetic algorithms for dependent job scheduling in grid computing environments. J. Supercomput. 62(1), 290–314 (2012)

    Article  Google Scholar 

  73. Grosan, C., Abraham, A., Helvik, B.: Multiobjective evolutionary algorithms for scheduling jobs on computational grids. In: International Conference on Applied Computing, pp. 459–463 (2007)

    Google Scholar 

  74. Das, H., Jena, A.K., Nayak, J., Naik, B., Behera, H.S.: A novel PSO based back propagation learning-MLP (PSO-BP-MLP) for classification. In: Computational Intelligence in Data Mining, vol. 2, pp. 461–471. Springer, New Delhi (2015)

    Google Scholar 

  75. Gamal, A., Hamam, Y.: Task allocation for maximizing reliability of distributed systems: a simulated annealing approach. J. Parallel Distrib. Comput. 66(10), 1259–1266 (2006)

    Article  MATH  Google Scholar 

  76. Filgueira, R., Ferreira da Silva, R., Krause, A., Deelman, E., Atkinson, M.: Asterism: Pegasus and dispel4py hybrid workflows for data-intensive science. In: 2016 Seventh International Workshop on Data-Intensive Computing in the Clouds (DataCloud), pp. 1–8. IEEE (2016)

    Google Scholar 

  77. Esteves, S., Veiga, L.: WaaS: workflow-as-a-service for the cloud with scheduling of continuous and data-intensive workflows. Comput. J. 59(3), 371–383 (2016)

    Article  Google Scholar 

  78. Gerlach, W., Tang, W., Keegan, K., Harrison, T., Wilke, A., Bischof, J., Dsouza, M., et al.: Skyport-container-based execution environment management for multi-cloud scientific workflows. In: 2014 5th International Workshop on Data-Intensive Computing in the Clouds (DataCloud), pp. 25–32. IEEE (2014)

    Google Scholar 

  79. Wang, J., Korambath, P., Altintas, I., Davis, J., Crawl, D.: Workflow as a service in the cloud: architecture and scheduling algorithms. Procedia Comput. Sci. 29, 546–556 (2014)

    Article  Google Scholar 

  80. Rodriguez, M.A., Buyya, R.: Scheduling dynamic workloads in multi-tenant scientific workflow as a service platforms. Future Gener. Comput. Syst. 79, 739–750 (2018)

    Article  Google Scholar 

  81. Kaur, P., Mehta, S.: Resource provisioning and work flow scheduling in clouds using augmented shuffled frog leaping algorithm. J. Parallel Distrib. Comput. 101, 41–50 (2017)

    Article  Google Scholar 

  82. Chu, S., Tsai, P., Pan, J.: Cat swarm optimization. In: Pacific Rim International Conference on Artificial Intelligence, pp. 854–858. Springer, Berlin, Heidelberg (2006)

    Google Scholar 

  83. Chu, S., Tsai, P.: Computational intelligence based on the behavior of cats. Int. J. Innov. Comput. Inf. Control 3(1), 163–173 (2007)

    Google Scholar 

  84. Sharafi, Y., Khanesar, M.A., Teshnehlab, M.: Discrete binary cat swarm optimization algorithm. In: 2013 3rd International Conference on Computer, Control & Communication (IC4), pp. 1–6. IEEE (2013)

    Google Scholar 

  85. Tsai, P.W., Pan, J.S., Chen, S.M., Liao, B.Y., Hao, S.P.: Parallel cat swarm optimization. In: 2008 International Conference on Machine Learning and Cybernetics, vol. 6, pp. 3328–3333. IEEE (2008)

    Google Scholar 

  86. Verma, A., Kaushal, S.: Cost-time efficient scheduling plan for executing workflows in the cloud. J. Grid Comput. 13(4), 495–506 (2015)

    Article  MathSciNet  Google Scholar 

  87. Ahmad, S.G., Liew, C.S., Munir, E.U., Ang, T.F., Khan, S.U.: A hybrid genetic algorithm for optimization of scheduling workflow applications in heterogeneous computing systems. J. Parallel Distrib. Comput. 87, 80–90 (2016)

    Article  Google Scholar 

  88. Tao, F., Feng, Y., Zhang, L., Liao, T.W.: CLPS-GA: A case library and Pareto solution-based hybrid genetic algorithm for energy-aware cloud service scheduling. Appl. Soft Comput. 19, 264–279 (2014)

    Article  Google Scholar 

  89. Kar, I., Parida, R.R., Das, H.: Energy aware scheduling using genetic algorithm in cloud data centers. In: International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT), pp. 3545–3550. IEEE (2016)

    Google Scholar 

  90. Kar, I., Das, H.: Energy aware task scheduling using genetic algorithm in cloud datacentres. Int. J. Comput. Sci. Inf. Technol. Res. 4(1), 106–111 (2016)

    Google Scholar 

  91. Sahoo, A.K., Das, H.: Energy efficient scheduling using DVFS technique in cloud datacenters. Int. J. Comput. Sci. Inf. Technol. Res. 4(1), 59–66 (2016)

    Google Scholar 

  92. Verma, A., Kaushal, S.: A hybrid multi-objective particle swarm optimization for scientific workflow scheduling. Parallel Comput. 62, 1–19 (2017)

    Article  MathSciNet  Google Scholar 

  93. Ezzatti, P., Pedemonte, M., Martín, A.: An efficient implementation of the Min-Min heuristic. Comput. Oper. Res. 40(11), 2670–2676 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  94. He, X., Sun, X., Laszewski, G.V.: QoS guided min-min heuristic for grid task scheduling. J. Comput. Sci. Technol. 18(4), 442–451 (2003)

    Article  MATH  Google Scholar 

  95. Singh, M., Suri, P.K.: QPS Max-Min<> Min-Min: a QoS based predictive Max-Min, Min-Min switcher algorithm for job scheduling in a grid. Inf. Technol. J. 7(8), 1176–1181 (2008)

    Article  Google Scholar 

  96. Tabak, E.K., Cambazoglu, B.B., Aykanat, C.: Improving the performance of independent task assignment heuristics minmin, maxmin and sufferage. IEEE Trans. Parallel Distrib. Syst. 25(5), 1244–1256 (2014)

    Article  Google Scholar 

  97. Casanova, H., Legrand, A., Zagorodnov, D., Berman, F.: Heuristics for scheduling parameter sweep applications in grid environments. In: 9th Heterogeneous Computing Workshop, 2000 (HCW 2000) Proceedings, pp. 349–363. IEEE (2000)

    Google Scholar 

  98. Chen, W., Zhang, J.: A set-based discrete PSO for cloud workflow scheduling with user-defined QoS constraints. In: 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 773–778. IEEE (2012)

    Google Scholar 

  99. Jianfang, C., Junjie, C., Qingshan, Z.: An optimized scheduling algorithm on a cloud workflow using a discrete particle swarm. Cybern. Inf. Technol. 14(1), 25–39 (2014)

    MathSciNet  Google Scholar 

  100. Bahrami, M., Bozorg-Haddad, O., Chu, X.: Cat swarm optimization (CSO) algorithm. In: Advanced Optimization by Nature-Inspired Algorithms, pp. 9–18. Springer, Singapore (2018)

    Google Scholar 

  101. Eusuff, M., Lansey, K., Pasha, F.: Shuffled frog-leaping algorithm: a memetic meta-heuristic for discrete optimization. Eng. Optim. 38(2), 129–154 (2006)

    Article  MathSciNet  Google Scholar 

  102. Liu, J.: Multisite management of scientific workflows in the cloud. Distributed, parallel, and cluster computing. Ph.D. dissertation, Universite de Montpellier (2016)

    Google Scholar 

  103. Liu, J., Pacitti, E., Valduriez, P., Oliveira, D., Mattoso, M.: Scientific workflow execution with multiple objectives in multisite clouds. In: BDA: Bases de Données Avancées (2016)

    Google Scholar 

  104. Pineda-Morales, L., Liu, J., Costan, A., Pacitti, E., Antoniu, G., Valduriez, P., Mattoso, M.: Managing hot metadata for scientific workflows on multisite clouds. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 390–397. IEEE (2016)

    Google Scholar 

  105. Tudoran, R., Costan, A., Antoniu, G.: Overflow: multi-site aware big data management for scientific workflows on clouds. IEEE Trans. Cloud Comput. 4(1), 76–89 (2016)

    Article  Google Scholar 

  106. Ahmad, M.K.H.: Scientific workflow execution reproducibility using cloud-aware provenance. Ph.D. dissertation, University of the West of England (UWE) (2016)

    Google Scholar 

  107. Jrad, F., Tao, J., Streit, A.: A broker-based framework for multi-cloud workflows. In: Proceedings of the 2013 International Workshop on Multi-cloud Applications and Federated Clouds, pp. 61–68. ACM (2013)

    Google Scholar 

  108. Kozlowszky, M., Karóczkai, K., Marton, A., Balasko, A., Marosi, A., Kacsuk, P.: Enabling generic distributed computing infrastructure compatibility for workflow management systems. Comput. Sci. 13(3), 61 (2012)

    Article  Google Scholar 

  109. Varghese, B., Wang, N., Barbhuiya, S., Kilpatrick, P., Nikolopoulos, D.S.: Challenges and opportunities in edge computing (2016). arXiv:1609.01967

  110. Meurisch, C., Seeliger, A., Schmidt, B., Schweizer, I., Kaup, F., Mühlhäuser, M.: Upgrading wireless home routers for enabling large-scale deployment of cloudlets. In: International Conference on Mobile Computing, Applications, and Services, pp. 12–29. Springer, Cham (2015)

    Google Scholar 

  111. Chen, W., Deelman, E.: Integration of workflow partitioning and resource provisioning. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID 2012), pp. 764–768 (2012)

    Google Scholar 

  112. Tang, W., Jenkins, J., Meyer, F., Ross, R., Kettimuthu, R., Winkler, L., Yang, X., Lehman, T., Desai, N.: Data-aware resource scheduling for multicloud workflows: a fine-grained simulation approach. In: 2014 IEEE 6th International Conference on Cloud Computing Technology and Science (CloudCom), pp. 887–892 (2014)

    Google Scholar 

  113. Yin, D., Kosar, T.: A data-aware workflow scheduling algorithm for heterogeneous distributed systems. In: International Conference on High Performance Computing and Simulation (HPCS), 2011, pp. 114–120. IEEE (2011)

    Google Scholar 

  114. Ghafarian, T., Javadi, B.: Cloud-aware data intensive workflow scheduling on volunteer computing systems. Future Gener. Comput. Syst. 51, 87–97 (2015)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samiya Khan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Khan, S., Ali, S.A., Hasan, N., Shakil, K.A., Alam, M. (2019). Big Data Scientific Workflows in the Cloud: Challenges and Future Prospects. In: Das, H., Barik, R., Dubey, H., Roy, D. (eds) Cloud Computing for Geospatial Big Data Analytics. Studies in Big Data, vol 49. Springer, Cham. https://doi.org/10.1007/978-3-030-03359-0_1

Download citation

Publish with us

Policies and ethics