Abstract
High-performance computing often involves sets of jobs or workloads that must be scheduled. If there are dependencies in the ordering of the jobs (e.g., pipelines or directed acyclic graphs) the user often has to carefully, manually submit the jobs in the right order and/or delay submitting dependent jobs until other jobs have finished. If the user can submit the entire workload with dependencies, then the scheduler has more information about future jobs in the workflow.
We have designed and implemented TrellisDAG, a system that combines the use of placeholder scheduling and a subsystem for describing workflows to provide novel mechanisms for computing non-trivial workloads with inter-job dependencies. TrellisDAG also has a modular architecture for implementing different scheduling policies, which will be the topic of future work. Currently, TrellisDAG supports:
-
1
A spectrum of mechanisms for users to specify both simple and complicated workflows.
-
2
The ability to load balance across multiple administrative domains.
-
3
A convenient tool to monitor complicated workflows.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997)
CISS - The Canadian Internetworked Scientific Supercomputer, http://www.cs.ualberta.ca/~ciss/
Condor, http://www.cs.wisc.edu/condor
DAGMan Metascheduler. http://www.cs.wisc.edu/condor/dagman/
Feitelson, D.G., Rudolph, L., Schwiegelshohn, U., Sevcik, K.C., Wong, P.: Theory and Practice in Parallel Job Scheduling. In: Feitelson, D.G., Rudolph, L. (eds.) IPPS-WS 1997 and JSSPP 1997. LNCS, vol. 1291, pp. 1–34. Springer, Heidelberg (1997)
Foster, I., Kesselman, C. (eds.): The Grid: Blueprint for a Future Computing Infrastructure. Morgan-Kaufmann, San Francisco (1999)
Foster, I., Kesselman, C., Nick, J.M., Tecke, S.: The Physiology of the Grid: An Open Grid Services Architecture for Distributed System Integration (June 2002)
Frey, T., Tannenbaum, J., Livny, M., Foster, I., Tuecke, S.: Condor-G: a computation management agent for multi- institutional grids. In: High Performance Distributed Computing, 2001. Proceedings. 10th IEEE International Symposium, San Francisco, CA, USA, pp. 55–63. IEEE Computer Society Press, Los Alamitos (2001)
Globus Project, http://www.globus.org/
Goldenberg, M.: TrellisDAG: A System For Structured DAG Scheduling. Master’s thesis, Dept. of Computing Science, University of Alberta, Edmonton, Alberta,Canada (2003)
Lake, R., Schaeffer, J.: Solving the Game of Checkers. In: Nowakowski, R.J. (ed.) Games of No Chance, pp. 119–133. Cambridge University Press, Cambridge (1996)
Lake, R., Schaeffer, J., Lu, P.: Solving Large Retrograde Analysis Problems Using a Network of Workstations. Advances in Computer Chess VII, 135–162 (1994)
Litzkow, M.J., Livny, M., Mutka, M.W.: Condor: A hunter of idle workstations. In: 8th International Conference on Distributed Computing Systems, Washington, D. C., USA, pp. 104–111. IEEE Computer Society Press, Los Alamitos (1988)
Pinchak, C., Lu, P., Goldenberg, M.: Practical Heterogeneous Placeholder Scheduling in Overlay Metacomputers:Early Experiences. In: 8th Workshop on Job Scheduling Strategies for Parallel Processing, Edinburgh, Scotland, U. K, July 24 (2002)
Pinchak, C., Lu, P., Schaeffer, J., Goldenberg, M.: The Canadian Internet-worked Scientific Supercomputer. In: 17th Annual International Symposium on High Performance Computing Systems and Applications (HPCS), Sherbrooke, Quebec, Canada, May 11 - 14, pp. 193–199 (2003)
RC5 Project, http://www.distributed.net/rc5
Schaeffer, J., Björnsson, Y., Burch, N., Lake, R., Lu, P., Sutphen, S.: Building the Checkers 10-Piece Endgame Databases. In: Advances in Computer Games X (2003) (in press)
SETI@home, http://setiathome.ssl.berkeley.edu/
Szafron, D., Lu, P., Greiner, R., Wishart, D., Lu, Z., Poulin, B., Eisner, R., Anvik, J., Macdonell, C., Habibi-Nazhad, B.: Proteome Analyst - Transparent High-Throughput Protein Annotation: Function, Localization, and Custom Predictors. Technical Report TR 03-05, Dept. of Computing Science, University of Alberta (2003), http://www.cs.ualberta.ca/~bioinfo/PA/
Thain, D., Bent, J., Arpaci-Dusseau, A.C., Arpaci-Dusseau, R.H., Livny, M.: The architectural implications of pipeline and batch sharing in scientific workloads. Technical Report 1463, Computer Sciences Department, University of Wisconsin, Madison (2002)
WestGrid, http://www.westgrid.ca
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Goldenberg, M., Lu, P., Schaeffer, J. (2003). TrellisDAG: A System for Structured DAG Scheduling. In: Feitelson, D., Rudolph, L., Schwiegelshohn, U. (eds) Job Scheduling Strategies for Parallel Processing. JSSPP 2003. Lecture Notes in Computer Science, vol 2862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/10968987_2
Download citation
DOI: https://doi.org/10.1007/10968987_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20405-3
Online ISBN: 978-3-540-39727-4
eBook Packages: Springer Book Archive