Abstract
In this chapter we address the problem of allocating parallel tasks on a distributed memory machine for coarse-grain applications represented by parameterized task graphs (PTG). A PTG is a new computation model for representing directed acyclic task graphs (DAG) symbolically. The size of a PTG is independent of the problem size and its parameters can be instantiated at run time. Parameterindependent optimization is important for exploiting non-static parallelism in scientific computing programs with varying problem sizes and the previous DAG scheduling algorithms are not able to handle such cases. We present and study a symbolic scheduling algorithm called SLC (Symbolic Linear Clustering) which derives task clusters from a PTG using affine piecewise mapping functions and then evenly assigns clusters to processors. Our experimental results show that the proposed method is effective for a number of compute-intensive problems in scientific applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Adve, V. S. and Vernon, M. K. A Deterministic Model for Parallel Program Performance Evaluation. (Submitted for publication).
Amarasinghe, S., Anderson, J. M., Lam, M. S., and Tseng, C. (1995). The SUIF Compiler for Scalable Parallel Machines. In seventh SIAM Conference on Parallel Processing for Scientific Computing.
Anderson, J. M. and Lam, M. S. (1993). Global Optimizations for Parallelism and Locality on Scalable Parallel Machines. In ACM SIGPLAN’93 Conference on Programming Language Design and Implementation.
Chong, F. T., Sharma, S. D., Brewer, E. A., and Saltz, J. (1995). Multiprocessor Runtime Support for Fine-Grained Irregular DAGs. In Kalia, R. K. and Vashishta, P., editors, Toward Teraflop Computing and New Grand Challenge” Applications., New York. Nova Science Publishers.
Cosnard, M. and Jeannot, E. (1999). Compact DAG Representation and Its Dynamic Scheduling. Journal of Parallel and Distributed Computing, 58(3):487–514.
Cosnard, M., Jeannot, E., and Rougeot, L. (1998). Low Memory Cost Dynamic Scheduling of Large Coarse Grain Task Graphs. In IEEE International Parallel Processing Symposium (IPPS’98), Orlando, Florida. IEEE.
Cosnard, M., Jeannot, E., and Yang, T. (1999). SLC: Symbolic Scheduling for Executing Parameterized Task Graphs on Multiprocessors. In International Conference on Parallel Processing (ICPP’99), Aizu Wakamatsu, Japan.
Cosnard, M. and Loi, M. (1995). Automatic Task Graph Generation Techniques. Parallel Processing Letters, 5(4):527–538.
Cosnard, M. and Loi, M. (1996). A Simple Algorithm for the Generation of Efficient Loop Structures. International Journal of Parallel Programming, 24(3):265–289.
Darte, A. and Robert, Y. (1993). On the Alignment Problem. Parallel Processing Letters, 4(3):259–270.
Deelman, E., Dube, A., Hoisie, A., Luo, Y., Oliver, R., Sunderam-Stukel, D., Wasserman, H., Adve, V., Bagrodia, R., Browne, J., Houstis, E., Lubeck, O., Rice, J., Teller, P., and Vernon, M. (1998). POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems. In First International Workshop on Software and Performance, Santa Fe, USA.
Dion, M. and Robert, Y. (1995). Mapping Affine Loop Nests : New Results. In Int. Conf. on High Performance Computing and Networking, HPCN’95, pages 184–189.
El-Rewini, H., Lewis, T., and Ali, H. (1994). Task Scheduling in Parallel and Distributed Systems. Prentice Hall.
Feautrier, P. (1994). Toward Automatic Distribution. Parallel Processing Letters, 4(3):233–244.
Feautrier, P. (1991). Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20(1):23–53.
Feautrier, P. (1996). Distribution automatique des données et des calculs. T.S.I., 15(5):529–557.
Fu, C. and Yang, T. (1996). Sparse LU Factorization with Partial Pivoting on Distributed Memory Machines. In ACM/IEEE Supercomputing’96, Pittsburgh.
Gerasoulis, A., Jiao, J., and Yang, T. (1995). Scheduling of Structured and Unstructured Computation . In Hsu, D., Rosenberg, A., and Sotteau, D., editors, Interconnections Networks and Mappings and Scheduling Parallel Computation , pages 139–172. American Math. Society.
Gerasoulis, A. and Yang, T. (1993). On the Granularity and Clustering of Direct Acyclic Task Graphs. IEEE Transactions on Parallel and Distributed Systems, 4(6):686–701.
Jeannot, E. (1999). Allocation de graphes de tâches Éccoleparamétrés et génération de code. PhD thesis, Normale Supérieure de Lyon, France. ftp ://f tp . ens-Lyon . f r/pub/LIP/Rapport s/PhD /PhD1999/PhD1999–08.ps . Z.
Kwok, Y.-K. and Ahmad, I. (1996). Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 7(5):506–521.
Liou, J.-C. and Palis, M. A. (1998). A New Heuristic for Scheduling Parallel Programs on Multiprocessor. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT’98), pages 358–365, Paris.
Loi, M. (1996). Construction et exécution de graphe de tâches acycliques à gros grain. PhD thesis, Ecole Normale Supérieure de Lyon, France.
Mongenet, C. (1997). Affine Dependence Classification for Communications Minimization. IJPP, 25(6).
Namyst, R. and Méhaut, J.-F. (1995). PM2: Parallel Multithreaded Machine. A computing environment for distributed architectures. In Parallel Computing (ParCo’95), pages 279–285. Elsevier Science Publishers.
Palis, M., Liou, J.-C., and Wei, D. (1996). Task Clustering and Scheduling for Distributed Memory Parallel Architectures. IEEE Transactions on Parallel and Distributed Systems, 7(1):46–55.
Papadimitriou, C. and Yannakakis, M. (1990). Toward an Architecture Independent Analysis of Parallel Algorithms. SIAM Journal on Computing, 19(2):322–328.
Pugh, W. (1992). The Omega Test a fast and practical integer programming algorithm for dependence analysis. Communication of the ACM. (http://www.cs.umd.edu/projects/omega).
Sarkar, V. (1989). Partitioning and Scheduling Parallel Program for Execution on Multiprocessors. MIT Press, Cambridge MA.
Schrijver, A. (1986). Theory of linear and integer programming. John Wiley & sons.
Tang, X. and Gao, G. R. (1998). How “Hard” is Thread Partitioning and How “Bad” is a List Scheduling Based Partitioning Algorithm? In tenthACMSymposium on Parallel Algorithms and Architectures (SPAA98), Puerto Vallarta, Mexico.
Yang, T. and Gerasoulis, A. (1992). Pyrros: Static Task Scheduling and Code Generation for Message Passing Multiprocessor. In Supercomputing’92, pages 428–437, Washington D.C. ACM.
Yang, T. and Gerasoulis, A. (1994). DSC Scheduling Parallel Tasks on an Unbounded Number of Processors. IEEE Transactions on Parallel and Distributed Systems, 5(9):951–967.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Cosnard, M., Jeannot, E., Yang, T. (2000). Symbolic Scheduling of Parameterized Task Graphs on Parallel Machines. In: Pardalos, P.M., Pitsoulis, L.S. (eds) Nonlinear Assignment Problems. Combinatorial Optimization, vol 7. Springer, Boston, MA. https://doi.org/10.1007/978-1-4757-3155-2_9
Download citation
DOI: https://doi.org/10.1007/978-1-4757-3155-2_9
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-4841-0
Online ISBN: 978-1-4757-3155-2
eBook Packages: Springer Book Archive