Skip to main content

CAD: An Efficient Data Management and Migration Scheme across Clouds for Data-Intensive Scientific Applications

  • Conference paper
Book cover Data Management in Grid and Peer-to-Peer Systems (Globe 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6864))

Included in the following conference series:

Abstract

Data management and migration are important research challenges of novel Cloud environments. While moving data among different geographical domains, it is important to lower the transmission cost for performance purposes. Efficient scheduling methods allow us to manage data transmissions with lower number of steps and shorter transmission time. In previous research efforts, several methods have been proposed in literature in order to manage data and minimize transmission cost for the case of Single Cluster environments. Unfortunately, these methods are not suitable to large-scale and complicated environments such as Clouds, with particular regard to the case of scheduling policies. Starting from these motivations, in this paper we propose an efficient data transmission method for data-intensive scientific applications over Clouds, called Cloud Adaptive Dispatching (CAD). This method adapts to specialized characteristics of Cloud systems and successfully shortens the transmission cost, while also avoiding node contention during moving data from sites to sites. We conduct an extensive campaign of experiments focused to test the effective performance of CAD. Results clearly demonstrate the improvements offered by CAD in supporting data transmissions across Clouds for data-intensive scientific applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. de Assuncao, M.D., di Costanzo, A., Buyya, R.: Evaluating the Cost-Benefit of Using Cloud Computing to Extend the Capacity of Clusters. In: Proceedings of the 18th ACM International Symposium on High Performance Distributed Computing, pp. 141–150 (June 2009)

    Google Scholar 

  2. Baptiste, P., Brucker, P., Chrobak, M., Dürr, C., Kravchenko, S.A., Sourd, F.: The Complexity of Mean Flow Time Scheduling Problems with Release Times. Journal of Scheduling 10(2), 139–146 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  3. Brucker, P., Kravchenko, S.A.: Scheduling Jobs with Equal Processing Times and Time Windows on Identical Parallel Machines. Journal of Scheduling 11(4), 229–237 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  4. Byun, E.J., Choi, S.J., Baik, M.S., Gil, J.M., Park, C.Y., Hwang, C.S.: MJSA: Markov Job Scheduler based on Availability in Desktop Grid Computing Environments. Future Generation Computer Systems 23(4), 616–622 (2007)

    Article  Google Scholar 

  5. Castillo, C., Rouskas, G.N., Harfoush, K.: Efficient Resource Management Using Advance Reservations for Heterogeneous Grids. In: Proceedings of 21st IEEE International Parallel and Distributed Processing, pp. 1–12 (April 2008)

    Google Scholar 

  6. Chang, R.-S., Chang, J.-S., Lin, P.-S.: An Ant Algorithm for Balanced Job Scheduling in Grids. Future Generation Computer Systems 25(1), 20–27 (2009)

    Article  Google Scholar 

  7. Cheng, C.-W., Wu, J.-J., Liu, P.: QoS-Aware, Access-Efficient, and Storage-Efficient Replica Placements. Journal of Supercomputing 49(1), 42–63 (2009)

    Article  MathSciNet  Google Scholar 

  8. Cohen, J., Jeannot, E., Padoy, N., Wagner, F.c.: Messages Scheduling for Parallel Data Redistribution between Clusters. IEEE Transactions on Parallel and Distributed Systems 17(10), 1163–1175 (2006)

    Article  Google Scholar 

  9. Grounds, N.G., Antonio, J.K., Muehring, J.: Cost-Minimizing Scheduling of Workflows on a Cloud of Memory Managed Multicore Machines. In: Jaatun, M.G., Zhao, G., Rong, C. (eds.) Cloud Computing. LNCS, vol. 5931, pp. 435–450. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  10. Guo, M., Pan, Y., Liu, Z.: Symbolic Communication Set Generation for Irregular Parallel Applications. The Journal of Supercomputing 25(3), 199–214 (2003)

    Article  MATH  Google Scholar 

  11. Hsu, C.-H., Bai, S.-W., Chung, Y.-C., Yang, C.-S.: A Generalized Basic-Cycle Calculation Method for Efficient Array Redistribution. IEEE Transactions on Parallel and Distributed Systems 11(12), 1201–1216 (2000)

    Article  Google Scholar 

  12. Hsu, C.-H., Chen, M.-H., Yang, C.-T., Li, K.-C.: Optimizing Communications of Dynamic Data Redistribution on Symmetrical Matrices in Parallelizing Compilers. IEEE Transactions on Parallel and Distributed Systems 17(11), 1226–1241 (2006)

    Article  Google Scholar 

  13. Hsu, C.-H., Chen, S.-C., Lan, C.-Y.: Scheduling Contention-Free Irregular Redistributions in Parallelizing Compilers. The Journal of Supercomputing 40(3), 229–247 (2007)

    Article  Google Scholar 

  14. Huang, J.-W., Chu, C.-P.: A Flexible Processor Mapping Technique toward Data Localization for Block-Cyclic Data Redistribution. The Journal of Supercomputing 45(2), 151–172 (2008)

    Article  Google Scholar 

  15. Jeannot, E., Wagner, F.: Scheduling Messages for Data Redistribution: An Experimental Study. The International Journal of High Performance Computing Applications 20(4), 443–454 (2006)

    Article  Google Scholar 

  16. Kalpakis, K., Dasgupta, K., Wolfson, O.: Optimal Placement of Replicas in Trees with Read, Write, and Storage Costs. IEEE Transactions on Parallel and Distributed Systems 12(6), 628–637 (2001)

    Article  Google Scholar 

  17. Karwande, A., Yuan, X., Lowenthal, D.K.: An MPI Prototype for Compiled Communication on Ethernet Switched Clusters. Journal of Parallel and Distributed Computing 65(10), 1123–1133 (2005)

    Article  Google Scholar 

  18. Lin, P.-Y., Liu, P.: Job Scheduling Techniques for Distributed Systems with Temporal Constraints. In: Bellavista, P., Chang, R.-S., Chao, H.-C., Lin, S.-F., Sloot, P.M.A. (eds.) GPC 2010. LNCS, vol. 6104, pp. 280–289. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  19. Liu, H., Orban, D.: GridBatch: Cloud Computing for Large-Scale Data-Intensive Batch Applications. In: Proceedings of the 8th IEEE International Symposium on Cluster Computing and the Grid, May 2008, pp. 295–304 (2008)

    Google Scholar 

  20. Prylli, L., Touranchean, B.: Fast Runtime Block Cyclic Data Redistribution on Multiprocessors. Journal of Parallel and Distributed Computing 45(1), 63–72 (1997)

    Article  MATH  Google Scholar 

  21. Rauber, T., Rünger, G.: A Data Re-Distribution Library for Multi-Processor Task Programming. International Journal of Foundations of Computer Science 17(2), 251–270 (2006)

    Article  MATH  Google Scholar 

  22. Sudarsan, R., Ribbens, C.J.: Efficient Multidimensional Data Redistribution for Resizable Parallel Computations. In: Stojmenovic, I., Thulasiram, R.K., Yang, L.T., Jia, W., Guo, M., de Mello, R.F. (eds.) ISPA 2007. LNCS, vol. 4742, pp. 182–194. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  23. Tu, M., Li, P., Ma, Q., Yen, I.-L., Bastani, F.B.: On the Optimal Placement of Secure Data Objects over Internet. In: Proceedings of 19th IEEE International Parallel and Distributed Processing, pp. 14–14 (April 2005)

    Google Scholar 

  24. Wang, H., Guo, M., Wei, D.: Divide-and-Conquer Algorithm for Irregular Redistribution in Parallelizing Compilers. The Journal of Supercomputing 29(2), 157–170 (2004)

    Article  MATH  Google Scholar 

  25. Wang, H., Guo, M., Wei, D.: Message Scheduling for Irregular Data Redistribution in Parallelizing Compilers. IEICE Transactions on Information and Systems E89-D(2), 418–424 (2006)

    Article  MathSciNet  Google Scholar 

  26. Wee, S., Liu, H.: Client-Side Load Balancer using Cloud. In: Proceedings of the 2010 ACM Symposium on Applied Computing, pp. 399–405 (March 2010)

    Google Scholar 

  27. Wu, J.-J., Lin, Y.-F., Liu, P.: Optimal Replica Placement in Hierarchical Data Grids with Locality Assurance. Journal of Parallel and Distributed Computing 68(12), 1517–1538 (2008)

    Article  Google Scholar 

  28. Yang, Y., Liu, K., Chen, J., Liu, X., Yuan, D., Jin, H.: An Algorithm in SwinDeW-C for Scheduling Transaction-Intensive Cost-Constrained Cloud Workflows. In: Proceedings of the 4th IEEE International Conference on eScience, pp. 374–375 (December 2008)

    Google Scholar 

  29. Yook, H.-G., Park, M.-S.: Scheduling GEN_BLOCK Array Redistribution. The Journal of Supercomputing 22(3), 251–267 (2002)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hsu, CH., Cuzzocrea, A., Chen, SC. (2011). CAD: An Efficient Data Management and Migration Scheme across Clouds for Data-Intensive Scientific Applications. In: Hameurlain, A., Tjoa, A.M. (eds) Data Management in Grid and Peer-to-Peer Systems. Globe 2011. Lecture Notes in Computer Science, vol 6864. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22947-3_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22947-3_11

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22946-6

  • Online ISBN: 978-3-642-22947-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics