A New Algorithm for Intermediate Dataset Storage in a Cloud-Based Dataflow

Cheng, Jie; Zhu, Daming; Zhu, Binhai

doi:10.1007/978-3-319-19647-3_4

A New Algorithm for Intermediate Dataset Storage in a Cloud-Based Dataflow

Jie Cheng¹⁵,
Daming Zhu¹⁶ &
Binhai Zhu¹⁷

Conference paper
First Online: 01 January 2015

748 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9130))

Abstract

Running a dataflow in a cloud environment usually generates many useful intermediate datasets. A strategy for running a dataflow is to decide which datasets should be stored, while the rest of them are regenerated. The intermediate dataset storage (IDS) problem asks to find a strategy for running a dataflow, such that the total cost is minimized. The current best algorithm for linear-structure IDS takes \(O(n^4)\) time, where “linear-structure” means that the structure of the datasets in the dataflow is a pipeline. In this paper, we present a new algorithm for this problem, and improve the time complexity to \(O(n^3)\), where \(n\) is the number of datasets in the pipeline.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Deelman, E., Chervenak, A.: Data management challenges of data-intensive scientific workflows. In: IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2008), pp. 687–692, Lyon, France (2008)
Google Scholar
Yuan, D., Yang, Y., Liu, X., Zhang, G., Chen, J.: On-demand minimum cost benchmarking for intermediate data storage in scientific cloud workflow systems. J. Parallel Distrib. Comput. 71(2), 316–332 (2011)
Article MATH Google Scholar
Adams, I., Long, D.D.E., Miller, E.L., Pasupathy, S., Storer, M.W.: Maximizing efficiency by trading storage for computation. In: Workshop on Hot Topics in Cloud Computing (HotCloud 2009), pp. 1–5, San Diego, CA (2009)
Google Scholar
Yuan, D., Yang, Y., Liu, X., Zhang, G., Chen, J.: A data dependency based strategy for intermediate data storage in scientific cloud workflow systems. Concurr. Comput. Pract. Exp. 24(9), 956–976 (2010)
Article Google Scholar
Zohrevandi, M., Bazzi, R.A.: The bounded data reuse problem in scientific workflows. In: 2013 IEEE 27th International Symposium on Parallel & Distributed Processing, pp. 1051–1062 (2013)
Google Scholar
Han, L.X., Xie, Z., Baldock, R.: Automatic data reuse for accelerating data intensive applications in the Cloud. In: The 8th International Conference for Internet Technology and Secured Transactions (ICITST-2013), pp. 596–600 (2013)
Google Scholar

Download references

Acknowledgements

This paper is supported by national natural science foundation of China: 61472222, and natural science foundation of Shandong province: ZR2012Z002.

Author information

Authors and Affiliations

School of Mechanical, Electrical and Information Engineering, Shandong University, Weihai, China
Jie Cheng
School of Computer Science and Technology, Shandong University, Jinan, China
Daming Zhu
Department of Computer Science, Montana State University, Bozeman, MT, 59717-3880, USA
Binhai Zhu

Authors

Jie Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Daming Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Binhai Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daming Zhu .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Jianxin Wang
Courant Institute, New York University, New York, New York, USA
Chee Yap

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cheng, J., Zhu, D., Zhu, B. (2015). A New Algorithm for Intermediate Dataset Storage in a Cloud-Based Dataflow. In: Wang, J., Yap, C. (eds) Frontiers in Algorithmics. FAW 2015. Lecture Notes in Computer Science(), vol 9130. Springer, Cham. https://doi.org/10.1007/978-3-319-19647-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-19647-3_4
Published: 27 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19646-6
Online ISBN: 978-3-319-19647-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics