An Efficient Algorithm for Runtime Minimum Cost Data Storage and Regeneration for Business Process Management in Multiple Clouds
The proliferation of cloud computing provides flexible ways for users to utilize cloud resources to cope with data complex applications, such as Business Process Management (BPM) System. In the BPM system, users may have various usage manner of the system, such as upload, generate, process, transfer, store, share or access variety kinds of data, and these data may be complex and very large in size. Due to the pas-as-you-go pricing model of cloud computing, improper usage of cloud resources will incur high cost for users. Hence, for a typical BPM system usage, data could be regenerated, transferred and stored with multiple clouds, a data storage, transfer and regeneration strategy is needed to reduce the cost on resource usage. The current state-of-art algorithm can find a strategy that achieves minimum data storage, transfer and computation cost, however, this approach has very high computation complexity and is neither efficient nor practical to be applied at runtime. In this paper, by thoroughly investigating the trade-off problem of resources utilization, we propose a Provenance Candidates Elimination algorithm, which can efficiently find the minimum cost strategy for data storage, transfer and regeneration. Through comprehensive experimental evaluation, we demonstrate that our approach can calculate the minimum cost strategy in milliseconds, which outperforms the exiting algorithm by 2 to 4 magnitudes.
KeywordsCloud computing Business Process Management Datasets storage and regeneration
The research work was supported by the National Key R&D Program (2017YFB1400102, 2016YFB1000602), NSFC (61572295), SDNSFC (No. ZR2017ZB0420), and Shandong Major scientific and technological innovation projects (2018YFJH0506).
- 4.Burton, A., Treloar, A.: Publish my data: a composition of services from ANDS and ARCS. In: Fifth IEEE International Conference on e-Science, pp. 164–170. IEEE (2009)Google Scholar
- 5.Agarwala, S., Jadav, D., Bathen, L.A.: iCostale: adaptive cost optimization for storage clouds. In: 4th International Conference on Cloud Computing, pp. 436–443. IEEE (2011)Google Scholar
- 7.Deng, K., Song, J., Ren, K., Yuan, D., Chen, J.: Graph-cut based coscheduling strategy towards efficient execution of scientific workflows in collaborative cloud environments. In: Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing, pp. 34–41. IEEE Computer Society (2011)Google Scholar
- 8.Li, W., Yang, Y., Chen, J., Yuan, D.: A cost-effective mechanism for cloud data reliability management based on proactive replica checking. In: Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp. 564–571. IEEE Computer Society (2012)Google Scholar
- 9.Foster, I., Vockler, J., Wilde, M., Zhao, Y.: Chimera: a virtual data system for representing, querying, and automating data derivation. In: Proceedings of 14th International Conference on Scientific and Statistical Database Management, pp. 37–46. IEEE (2002)Google Scholar
- 10.Muniswamy-Reddy, K.-K., Macko, P., Seltzer, M.I.: Provenance for the cloud, pp. 14–15 (2010)Google Scholar
- 11.Gunda, P.K., Ravindranath, L., Thekkath, C.A., Yu, Y., Zhuang, L.: Nectar: automatic management of data and computation in datacenters. In: OSDI, pp. 1–8 (2010)Google Scholar
- 12.Yuan, D., Yang, Y., Liu, X., Chen, J.: A cost-effective strategy for intermediate data storage in scientific cloud workflow systems. In: Parallel & Distributed Processing (IPDPS), pp. 1–12. IEEE (2010)Google Scholar
- 15.Yuan, D., et al.: An algorithm for cost-effectively storing scientific datasets with multiple service providers in the cloud. In: 2013 IEEE 9th International Conference on eScience (eScience), pp. 285–292 (2013)Google Scholar