Advertisement

Scheduling File Transfers for Data-Intensive Jobs on Heterogeneous Clusters

  • Gaurav Khanna
  • Umit Catalyurek
  • Tahsin Kurc
  • P. Sadayappan
  • Joel Saltz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4641)

Abstract

This paper addresses the problem of efficient collective scheduling of file transfers requested by a batch of tasks. Our work targets a heterogeneous collection of storage and compute clusters. The goal is to minimize the overall time to transfer files to their respective destination nodes. Two scheduling schemes are proposed and experimentally evaluated against an existing approach, the Insertion Scheduling. The first is a 0-1 Integer Programming based approach which is based on the idea of time-expanded networks. This scheme achieves the minimum total file transfer time, but has significant scheduling overhead. To address this issue, we propose a maximum weight graph matching based heuristic approach. This scheme is able to perform as well as insertion scheduling and has much lower scheduling overhead. We conclude that the heuristic scheme is a better fit for larger workloads and systems.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Khanna, G., Vydyanathan, N., Kurc, T., Catalyurek, U., Wyckoff, P., Saltz, J., Sadayappan, P.: A hypergraph partitioning based approach for scheduling of tasks with batch-shared I/O. In: Proc. of CCGrid 2005, vol. 2, pp. 792–799 (2005)Google Scholar
  2. 2.
    Ford, L.R., Fulkerson, D.R.: Constructing maximal dynamic flows from static flows. Operations Research 6, 419–433 (1958)MathSciNetCrossRefGoogle Scholar
  3. 3.
    Khanna, G., Catalyurek, U., Kurc, T., Sadayappan, P., Saltz, J.: Scheduling file transfers for data-intensive jobs on heterogeneous clusters. Technical Report OSU-CISRC-1/07-TR05, CSE Dept, The Ohio State University (2007)Google Scholar
  4. 4.
    Giersch, A., Robert, Y., Vivien, F.: Scheduling tasks sharing files from distributed repositories. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 246–253. Springer, Heidelberg (2004)Google Scholar
  5. 5.
    Ibarra, O.H., Kim, C.E.: Heuristic algorithms for scheduling independent tasks on nonidentical processors. J. ACM 24, 280–289 (1977)zbMATHCrossRefMathSciNetGoogle Scholar
  6. 6.
    Allcock, W., Bresnahan, J., Kettimuthu, R., Link, M.: The globus striped gridftp framework and server. In: Proc. of SuperComputing 2005 (2005)Google Scholar
  7. 7.
    Gabow, H.N.: An efficient implementation of edmonds’ algorithm for maximum matching on graphs. J. ACM 23, 221–234 (1976)zbMATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Uysal, M., Kurc, T.M., Sussman, A., Saltz, J.: A performance prediction framework for data intensive applications on large scale parallel machines. In: O’Hallaron, D.R. (ed.) LCR 1998. LNCS, vol. 1511, pp. 243–258. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  9. 9.
    Fischetti, M., Glover, F., Lodi, A.: The feasibility pump. Math. Program 104, 91–104 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Czyzyk, J., Mesnier, M.P., Moré, J.J.: The neos server. IEEE Comput. Sci. Eng. 5, 68–75 (1998)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Gaurav Khanna
    • 1
  • Umit Catalyurek
    • 2
  • Tahsin Kurc
    • 2
  • P. Sadayappan
    • 1
  • Joel Saltz
    • 2
  1. 1.Dept. of Computer Science and Engineering 
  2. 2.Dept. of Biomedical Informatics, The Ohio State University 

Personalised recommendations