Abstract
Recently, some Scientific Workflow Management Systems (SWfMSs) with provenance support (e.g. Chiron) have been deployed in the cloud. However, they typically use a single cloud site. In this paper, we consider a multisite cloud, where the data and computing resources are distributed at different sites (possibly in different regions). Based on a multisite architecture of SWfMS, i.e. multisite Chiron, we propose a multisite task scheduling algorithm that considers the time to generate provenance data. We performed an extensive experimental evaluation of our algorithm using Microsoft Azure multisite cloud and two real-life scientific workflows (Buzz and Montage). The results show that our scheduling algorithm is up to 49,6% better than baseline algorithms in terms of total execution time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For instance, the time to execute “SELECT count(*) from eactivity” at the provenance database from each site: 0.0027s from WEU site, 0.0253s from NEU site and 0.1117s from CUS site.
References
Microsoft Azure. http://azure.microsoft.com
Montage. http://montage.ipac.caltech.edu/docs/gridtools.html
Parameters of different types of vms in microsoft Azure. https://azure.microsoft.com/en-us/pricing/details/virtual-machines/
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. In: 6th Symposium on Operating System Design and Implementation (OSDI), pp. 137–150 (2004)
Deelman, E., Gannon, D., Shields, M., Taylor, I.: Workflows and e-science: an overview of workflow system features and capabilities. Future Gener. Comput. Syst. 25(5), 528–540 (2009)
Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2008)
Deelman, E., Singh, G., Su, M.-H., Blythe, J., Gil, Y., Kesselman, C., Mehta, G., Vahi, K., Berriman, G.B., Good, J., Laity, A., Jacob, J.C., Katz, D.S.: Pegasus: a framework for mapping complex scientific workflows onto distributed systems. Sci. Program. 13(3), 219–237 (2005)
Dias, J., Ogasawara, E.S., de Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: Algebraic dataflows for big data analysis. In: IEEE International Conference on Big Data, pp. 150–155 (2013)
Duan, R., Prodan, R., Li, X.: Multi-objective game theoretic scheduling of bag-of-tasks workflows on hybrid clouds. IEEE Trans. Cloud Comput. 2(1), 29–42 (2014)
Etminani, K., Naghibzadeh, M.: A min-min max-min selective algorihtm for grid task scheduling. In: The Third IEEE/IFIP International Conference in Central Asia on Internet (ICI 2007), pp. 1–7 (2007)
Liu, J., Pacitti, E., Valduriez, P., de Oliveira, D., Mattoso, M.: Multi-objective scheduling of scientific workflows in multisite clouds. Future Gener. Comput. Syst. 63, 76–95 (2016)
Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13(4), 1–37 (2015)
Liu, J., Silva, V., Pacitti, E., Valduriez, P., Mattoso, M.: Scientific workflow partitioning in multisite cloud. In: Lopes, L., et al. (eds.) Euro-Par 2014. LNCS, vol. 8805, pp. 105–116. Springer, Cham (2014). doi:10.1007/978-3-319-14325-5_10
Maheswaran, M., Ali, S., Siegel, H.J., Hensgen, D., Freund, R.F.: Dynamic matching and scheduling of a class of independent tasks onto heterogeneous computing systems. In: 8th Heterogeneous Computing Workshop, p. 30 (1999)
Ogasawara, E.S., Dias, J., Silva, V., Chirigati, F.S., de Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: Chiron: a parallel engine for algebraic scientific workflows. Concurr. Comput. Pract. Exp. 25(16), 2327–2341 (2013)
Pineda-Morales, L., Costan, A., Antoniu, G.: Towards multi-site metadata management for geographically distributed cloud workflows. In: 2015 IEEE International Conference on Cluster Computing, (CLUSTER), pp. 294–303 (2015)
Smanchat, S., Indrawan, M., Ling, S., Enticott, C., Abramson, D.: Scheduling multiple parameter sweep workflow instances on the grid. In: 5th IEEE International Conference on E-Science, pp. 300–306 (2009)
Topcuouglu, H., Hariri, S., Wu, M.: Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans. Parallel Distrib. Syst. 13(3), 260–274 (2002)
Wieczorek, M., Prodan, R., Fahringer, T.: Scheduling of scientific workflows in the ASKALON grid environment. SIGMOD Rec. 34(3), 56–62 (2005)
Acknowledgment
Work partially funded by EU H2020 Programme and MCTI/RNP-Brazil (HPC4E grant agreement number 689772), CNPq, FAPERJ, and INRIA (MUSIC project), Microsoft (ZcloudFlow project) and performed in the context of the Computational Biology Institute (www.ibc-montpellier.fr). We would like to thank Weiwei Chen and Pegasus project for the help in modeling and executing the Montage SWf.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Liu, J., Pacitti, E., Valduriez, P., Mattoso, M. (2017). Scientific Workflow Scheduling with Provenance Support in Multisite Cloud. In: Dutra, I., Camacho, R., Barbosa, J., Marques, O. (eds) High Performance Computing for Computational Science – VECPAR 2016. VECPAR 2016. Lecture Notes in Computer Science(), vol 10150. Springer, Cham. https://doi.org/10.1007/978-3-319-61982-8_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-61982-8_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-61981-1
Online ISBN: 978-3-319-61982-8
eBook Packages: Computer ScienceComputer Science (R0)