Adaptive Caching for Data-Intensive Scientific Workflows in the Cloud

Heidsieck, Gaëtan; de Oliveira, Daniel; Pacitti, Esther; Pradal, Christophe; Tardieu, François; Valduriez, Patrick

doi:10.1007/978-3-030-27618-8_33

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11707))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

763 Accesses
13 Citations

Abstract

Many scientific experiments are now carried on using scientific workflows, which are becoming more and more data-intensive and complex. We consider the efficient execution of such workflows in the cloud. Since it is common for workflow users to reuse other workflows or data generated by other workflows, a promising approach for efficient workflow execution is to cache intermediate data and exploit it to avoid task re-execution. In this paper, we propose an adaptive caching solution for data-intensive workflows in the cloud. Our solution is based on a new scientific workflow management architecture that automatically manages the storage and reuse of intermediate data and adapts to the variations in task execution times and output data size. We evaluated our solution by implementing it in the OpenAlea system and performing extensive experiments on real data with a data-intensive application in plant phenotyping. The results show that adaptive caching can yield major performance gains, e.g., up to 120.16% with 6 workflow re-executions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adams, I.F., Long, D.D., Miller, E.L., Pasupathy, S., Storer, M.W.: Maximizing efficiency by trading storage for computation. In: HotCloud (2009)
Google Scholar
Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance collection support in the Kepler scientific workflow system. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 118–132. Springer, Heidelberg (2006). https://doi.org/10.1007/11890850_14
Chapter Google Scholar
Artzet, S., Brichet, N., Chopard, J., Mielewczik, M., Fournier, C., Pradal, C.: OpenAlea.Phenomenal: a workflow for plant phenotyping, September 2018. https://doi.org/10.5281/zenodo.1436634
Callahan, S.P., Freire, J., Santos, E., Scheidegger, C.E., Silva, C.T., Vo, H.T.: VisTrails: visualization meets data management. In: ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 745–747 (2006)
Google Scholar
Cohen-Boulakia, S., et al.: Scientific workflows for computational reproducibility in the life sciences: status, challenges and opportunities. Future Gener. Comput. Syst. (FGCS) 75, 284–298 (2017)
Article Google Scholar
Deelman, E., Singh, G., Livny, M., Berriman, B., Good, J.: The cost of doing science on the cloud: the montage example. In: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–12 (2008)
Google Scholar
Garijo, D., Alper, P., Belhajjame, K., Corcho, O., Gil, Y., Goble, C.: Common motifs in scientific workflows: an empirical analysis. Future Gener. Comput. Syst. (FGCS) 36, 338–351 (2014)
Article Google Scholar
Kelling, S., et al.: Data-intensive science: a new paradigm for biodiversity studies. BioScience 59(7), 613–620 (2009)
Article Google Scholar
Liu, J., et al.: Efficient scheduling of scientific workflows using hot metadata in a multisite cloud. IEEE Trans. Knowl. Data Eng. 1–20 (2018)
Google Scholar
Liu, J., Pacitti, E., Valduriez, P., Mattoso, M.: A survey of data-intensive scientific workflow management. J. Grid Comput. 13(4), 457–493 (2015)
Article Google Scholar
Ogasawara, E., Dias, J., Oliveira, D., Porto, F., Valduriez, P., Mattoso, M.: An algebraic approach for data-centric scientific workflows. Proc. VLDB Endow. (PVLDB) 4(12), 1328–1339 (2011)
Google Scholar
de Oliveira, D., Baião, F.A., Mattoso, M.: Towards a taxonomy for cloud computing from an e-Science perspective. In: Antonopoulos, N., Gillam, L. (eds.) Cloud Computing. Computer Communications and Networks, pp. 47–62. Springer, London (2010). https://doi.org/10.1007/978-1-84996-241-4_3
Chapter Google Scholar
Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 3rd edn. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-8834-8
Book Google Scholar
Pradal, C., et al.: InfraPhenoGrid: a scientific workflow infrastructure for plant phenomics on the grid. Future Gener. Comput. Syst. (FGCS) 67, 341–353 (2017)
Article Google Scholar
Pradal, C., Cohen-Boulakia, S., Heidsieck, G., Pacitti, E., Tardieu, F., Valduriez, P.: Distributed management of scientific workflows for high-throughput plant phenotyping. ERCIM News 113, 36–37 (2018)
Google Scholar
Roitsch, T., et al.: Review: new sensors and data-driven approaches–a path to next generation phenomics. Plant Sci. 282, 2–10 (2019)
Article Google Scholar
Tardieu, F., Cabrera-Bosquet, L., Pridmore, T., Bennett, M.: Plant phenomics, from sensors to knowledge. Curr. Biol. 27(15), R770–R783 (2017)
Article Google Scholar
Yuan, D., Yang, Y., Liu, X., Chen, J.: A cost-effective strategy for intermediate data storage in scientific cloud workflow systems. In: IEEE International Symposium on Parallel and Distributed Processing (IPDPS), pp. 1–12 (2010)
Google Scholar
Yuan, D., et al.: A highly practical approach toward achieving minimum data sets storage cost in the cloud. IEEE Trans. Parallel Distrib. Syst. 24(6), 1234–1244 (2013)
Article Google Scholar
Zhang, J., et al.: Bridging VisTrails scientific workflow management system to high performance computing. In: 2013 IEEE Ninth World Congress on Services, pp. 29–36. IEEE (2013)
Google Scholar

Download references

Acknowledgments

This work was supported by the #DigitAg French Convergence Lab. on Digital Agriculture (http://www.hdigitag.fr/com), the SciDISC Inria associated team with Brazil, the Phenome-Emphasis project (ANR-11-INBS-0012) and IFB (ANR-11-INBS-0013) from the Agence Nationale de la Recherche and the France Grille Scientific Interest Group.

Author information

Authors and Affiliations

Inria & LIRMM, Univ. Montpellier, Montpellier, France
Gaëtan Heidsieck, Esther Pacitti, Christophe Pradal & Patrick Valduriez
CIRAD & AGAP, Montpellier SupAgro, Montpellier, France
Christophe Pradal
INRA & LEPSE, Montpellier SupAgro, Montpellier, France
François Tardieu
Institute of Computing, UFF, Niterói, Brazil
Daniel de Oliveira

Authors

Gaëtan Heidsieck
View author publications
You can also search for this author in PubMed Google Scholar
Daniel de Oliveira
View author publications
You can also search for this author in PubMed Google Scholar
Esther Pacitti
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Pradal
View author publications
You can also search for this author in PubMed Google Scholar
François Tardieu
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Valduriez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gaëtan Heidsieck .

Editor information

Editors and Affiliations

Clausthal University of Technology, Clausthal-Zellerfeld, Germany
Sven Hartmann
Johannes Kepler University of Linz, Linz, Austria
Josef Küng
The University of Texas at Arlington, Arlington, TX, USA
Sharma Chakravarthy
Johannes Kepler University of Linz, Linz, Austria
Gabriele Anderst-Kotsis
Software Competence Center Hagenberg, Hagenberg im Mühlkreis, Austria
A Min Tjoa
Johannes Kepler University of Linz, Linz, Austria
Ismail Khalil

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heidsieck, G., de Oliveira, D., Pacitti, E., Pradal, C., Tardieu, F., Valduriez, P. (2019). Adaptive Caching for Data-Intensive Scientific Workflows in the Cloud. In: Hartmann, S., Küng, J., Chakravarthy, S., Anderst-Kotsis, G., Tjoa, A., Khalil, I. (eds) Database and Expert Systems Applications. DEXA 2019. Lecture Notes in Computer Science(), vol 11707. Springer, Cham. https://doi.org/10.1007/978-3-030-27618-8_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-27618-8_33
Published: 06 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27617-1
Online ISBN: 978-3-030-27618-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics