Adaptive Scheduling for Task Farming with Grid Middleware

  • Henri Casanova
  • MyungHo Kim
  • JamesS. Plank
  • JackJ. Dongarra
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1685)


Scheduling in metacomputing environments is an active field of research as the vision of a Computational Grid becomes more concrete. An important class of Grid applications are long-running parallel computations with large numbers of somewhat independent tasks (Monte-Carlo simulations, parameter-space searches, etc.). A number of Grid middle-ware projects are available to implement such applications but scheduling strategies are still open research issues. This is mainly due to the diversity of both Grid resource types and of their availability patterns. The purpose of this work is to develop and validate a general adaptive scheduling algorithm for task farming applications along with a user interface that makes the algorithm accessible to domain scientists. Our algorithm is general in that it is not tailored to a particular Grid middleware and that it requires very few assumptions concerning the nature of the resources. Our first testbed is NetSolve as it allows quick and easy development of the algorithm by isolating the developer from issues such as process control, I/O, remote software access, or fault-tolerance.


Farming Master-Slave Parallelism Scheduling Metacomputing Grid Computing 


  1. [1]
    Ian Foster and Carl Kesselman, editors. The Grid, Blueprint for a New computing Infrastructure. Morgan Kaufmann Publishers, Inc., 1998.Google Scholar
  2. [2]
    M. Litzkow, M. Livny, and M.W. Mutka. Condor-A Hunter of Idle Workstations. In Proc. of the 8th International Conference of Distributed Computing Systems, pages 104–111. Department of Computer Science, University of Winsconsin, Madison, June 1988.Google Scholar
  3. [3]
    L. Silva, B. Veer, and J. Silva. How to Get a Fault-Tolerant Farm. In World Transputer Congress, pages 923–938, Sep. 2993.Google Scholar
  4. [4]
    S. Sekiguchi, M. Sato, H. Nakada, S. Matsuoka, and U. Nagashima. Ninf:Network based Information Library for Globally High Performance Computing. In Proc. of Parallel Object-Oriented Methods and Applications (POOMA), Santa Fe, 1996.Google Scholar
  5. [5]
    I. Foster and K Kesselman. Globus: A Metacomputing Infrastructure Toolkit. In Proc. Workshop on Environments and Tools. SIAM, to appear.Google Scholar
  6. [6]
    A. Grimshaw, W. Wulf, J. French, A. Weaver, and P.Jr. Reynolds. A Synopsis of the Legion Project. Technical Report CS-94-20, Department of Computer Science, University of Virginia, 1994.Google Scholar
  7. [7]
    D. Abramson, I. Foster, J. Giddy, A. Lewis, R. Sosic, and R. Sutherst. The Nimrod Computational Workbench: A Case Study in Desktop Metacomputing. In Proceedings of the 20th Autralasian Computer Science Conference, Feb. 1997.Google Scholar
  8. [8]
  9. [9]
    D. Abramson and J. Giddy. Scheduling Large Parametric Modelling Experiments on a Distributed Meta-computer. In PCW’97, Sep. 1997.Google Scholar
  10. [10]
    A. Baratloo, P. Dasgupta, and Z. Kedem. Calypso: A Novel Software System for Fault-Tolerant Parallel Processing on Distributed Platforms. In 4th IEEE International Symposium on High Performance Distributed Computing, Aug. 1995.Google Scholar
  11. [11]
    L.M. Silva, J.G. Silva, S. Chapple, and L. Clarke. Portable checkpointing and recovery. In Proceedings of the HPDC-4, High-Performance Distributed Computing, pages 188–195, Washington, DC, August 1995.Google Scholar
  12. [12]
    F. Berman, R. Wolski, S. Figueira, J. Schopf, and G. Shao. Application-Level Scheduling on Distributed Heterogeneous Networks. In Proc. of Supercomputing’96, Pittsburgh, 1996.Google Scholar
  13. [13]
    F. Berman and R. Wolski. The AppLeS Project: A Status Report. In Proc. of the 8th NEC Research Symposium, Berlin, Germany, 1997.Google Scholar
  14. [14]
    F. Berman, R. Wolski, and G. Shao. Performance Effects of Scheduling Strategies for Master/Slave Distributed Applications. Technical Report TR-CS98-598, U.C., San Diego, 1998.Google Scholar
  15. [15]
    R. Wolski. Dynamically forecasting network performance using the network weather service. Technical Report TR-CS96-494, U.C. San Diego, October 1996.Google Scholar
  16. [16]
    M. Litzkow and M. Livny. Experience with the Condor Distributed Batch System. In Proc. of IEEE Workshop on Experimental Distributed Systems. Department of Computer Science, University of Winsconsin, Madison, 1990.Google Scholar
  17. [17]
    H. Casanova and J. Dongarra. Providing Uniform Dynamic Access to Numerical Software. In M. Heath, A. Ranade, and R. Schrieber, editors, IMA Volumes in Mathematics and its Applications, Algorithms for Parallel Processing, volume 105, pages 345–355. Springer-Verlag, 1998.Google Scholar
  18. [18]
    The Math Works Inc. MATLAB Reference Guide. The Math Works Inc., 1992.Google Scholar
  19. [19]
    S. Wolfram. The Mathematica Book, Third Edition. Wolfram Median, Inc. and Cambridge University Press, 1996.Google Scholar
  20. [20]
    H. Casanova, J. Dongarra, and K. Seymour. Client User’s Guide to Netsolve. Technical Report CS-96-343, Department of Computer Science, University of Tennessee, 1996.Google Scholar
  21. [21]
    H Casanova and J. Dongarra. NetSolve: A Network Server for Solving Computational Science Problems. The International Journal of Supercomputer Applications and High Performance Computing, 1997.Google Scholar
  22. [22]
    H. Casanova and J. Dongarra. NetSolve’s Network Enabled Server: Examples and Applications. IEEE Computational Science & Engineering, 5(3):57–67, September 1998.Google Scholar
  23. [23]
    H. Casanova and J. Dongarra. NetSolve version 1.2: Design and Implementation. Technical Report to appear, Department of Computer Science, University of Tennessee, 1998.Google Scholar
  24. [24]
    D.E. Bakken and R.D. Schilchting. Supporting fault-tolerant parallel programming in Linda. IEEE Transactions on Parallel and Distributed Systems, 6(3):287–302, March 1995.Google Scholar
  25. [25]
    D. Gelernter and D. Kaminsky. Supercomputing out of recycled garbage: Preliminary experience with piranha. In International Conference on Supercomputing, pages 417–427, Washington, D.C., June 1992. ACM.Google Scholar
  26. [26]
    J.R. Stiles, T.M. Bartol, E.E. Salpeter,, and M.M. Salpeter. Monte Carlo simulation of neuromuscular transmitter release using MCell, a general simulator of cellular physiological processes. Computational Neuroscience, pages 279–284, 1998.Google Scholar
  27. [27]
    J.R. Stiles, D. Van Helden, T.M. Bartol, E.E. Salpeter, and M.M. Salpeter. Miniature end-plate current rise times <100 microseconds from improved dual recordings can be modeled with passive acetylcholine diffusion form a synaptic vesicle. In Proc. Natl. Acad. Sci. U.S.A., volume 93, pages 5745–5752, 1996.Google Scholar
  28. [28]
    M. Beck, J. Plank, T. Moore, and W. Elwasif. Why IBP Now. The International Journal of Supercomputer Applications and High Performance Computing, to appear.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1999

Authors and Affiliations

  • Henri Casanova
    • 1
  • MyungHo Kim
    • 2
  • JamesS. Plank
    • 3
  • JackJ. Dongarra
    • 3
    • 4
  1. 1.Department of Computer Science and EngineeringUniversity of California at San DiegoLa JollaUSA
  2. 2.School of ComputingSoongSil UniversitySeoulKorea
  3. 3.Department of Computer ScienceUniversity of TennesseeKnoxvilleUSA
  4. 4.Mathematical Science SectionOak Ridge National LaboratoryOak RidgeUSA

Personalised recommendations