Advertisement

Symbolic Scheduling of Parameterized Task Graphs on Parallel Machines

  • Michel Cosnard
  • Emmanuel Jeannot
  • Tao Yang
Part of the Combinatorial Optimization book series (COOP, volume 7)

Abstract

In this chapter we address the problem of allocating parallel tasks on a distributed memory machine for coarse-grain applications represented by parameterized task graphs (PTG). A PTG is a new computation model for representing directed acyclic task graphs (DAG) symbolically. The size of a PTG is independent of the problem size and its parameters can be instantiated at run time. Parameterindependent optimization is important for exploiting non-static parallelism in scientific computing programs with varying problem sizes and the previous DAG scheduling algorithms are not able to handle such cases. We present and study a symbolic scheduling algorithm called SLC (Symbolic Linear Clustering) which derives task clusters from a PTG using affine piecewise mapping functions and then evenly assigns clusters to processors. Our experimental results show that the proposed method is effective for a number of compute-intensive problems in scientific applications.

Keywords

Parallel Machine Gaussian Elimination Task Graph Task Instance Linear Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [Adve and Vernon,]
    Adve, V. S. and Vernon, M. K. A Deterministic Model for Parallel Program Performance Evaluation. (Submitted for publication).Google Scholar
  2. [Amarasinghe et al., 1995]
    Amarasinghe, S., Anderson, J. M., Lam, M. S., and Tseng, C. (1995). The SUIF Compiler for Scalable Parallel Machines. In seventh SIAM Conference on Parallel Processing for Scientific Computing.Google Scholar
  3. [Anderson and Lam, 1993]
    Anderson, J. M. and Lam, M. S. (1993). Global Optimizations for Parallelism and Locality on Scalable Parallel Machines. In ACM SIGPLAN’93 Conference on Programming Language Design and Implementation.Google Scholar
  4. [Chong et al., 1995]
    Chong, F. T., Sharma, S. D., Brewer, E. A., and Saltz, J. (1995). Multiprocessor Runtime Support for Fine-Grained Irregular DAGs. In Kalia, R. K. and Vashishta, P., editors, Toward Teraflop Computing and New Grand Challenge” Applications., New York. Nova Science Publishers.Google Scholar
  5. [Cosnard and Jeannot, 1999]
    Cosnard, M. and Jeannot, E. (1999). Compact DAG Representation and Its Dynamic Scheduling. Journal of Parallel and Distributed Computing, 58(3):487–514.CrossRefGoogle Scholar
  6. [Cosnard et al., 1998]
    Cosnard, M., Jeannot, E., and Rougeot, L. (1998). Low Memory Cost Dynamic Scheduling of Large Coarse Grain Task Graphs. In IEEE International Parallel Processing Symposium (IPPS’98), Orlando, Florida. IEEE.Google Scholar
  7. [Cosnard et al., 1999]
    Cosnard, M., Jeannot, E., and Yang, T. (1999). SLC: Symbolic Scheduling for Executing Parameterized Task Graphs on Multiprocessors. In International Conference on Parallel Processing (ICPP’99), Aizu Wakamatsu, Japan.Google Scholar
  8. [Cosnard and Loi, 1995]
    Cosnard, M. and Loi, M. (1995). Automatic Task Graph Generation Techniques. Parallel Processing Letters, 5(4):527–538.CrossRefGoogle Scholar
  9. [Cosnard and Loi, 1996]
    Cosnard, M. and Loi, M. (1996). A Simple Algorithm for the Generation of Efficient Loop Structures. International Journal of Parallel Programming, 24(3):265–289.Google Scholar
  10. [Darte and Robert, 1993]
    Darte, A. and Robert, Y. (1993). On the Alignment Problem. Parallel Processing Letters, 4(3):259–270.MathSciNetCrossRefGoogle Scholar
  11. [Deelman et al., 1998]
    Deelman, E., Dube, A., Hoisie, A., Luo, Y., Oliver, R., Sunderam-Stukel, D., Wasserman, H., Adve, V., Bagrodia, R., Browne, J., Houstis, E., Lubeck, O., Rice, J., Teller, P., and Vernon, M. (1998). POEMS: End-to-End Performance Design of Large Parallel Adaptive Computational Systems. In First International Workshop on Software and Performance, Santa Fe, USA.Google Scholar
  12. [Dion and Robert, 1995]
    Dion, M. and Robert, Y. (1995). Mapping Affine Loop Nests : New Results. In Int. Conf. on High Performance Computing and Networking, HPCN’95, pages 184–189.CrossRefGoogle Scholar
  13. [El-Rewini et al., 1994]
    El-Rewini, H., Lewis, T., and Ali, H. (1994). Task Scheduling in Parallel and Distributed Systems. Prentice Hall.Google Scholar
  14. [Feautrier, 1994]
    Feautrier, P. (1994). Toward Automatic Distribution. Parallel Processing Letters, 4(3):233–244.CrossRefGoogle Scholar
  15. [Feautrier, 1991]
    Feautrier, P. (1991). Dataflow analysis of array and scalar references. International Journal of Parallel Programming, 20(1):23–53.MATHCrossRefGoogle Scholar
  16. [Feautrier, 1996]
    Feautrier, P. (1996). Distribution automatique des données et des calculs. T.S.I., 15(5):529–557.Google Scholar
  17. [Fu and Yang, 1996]
    Fu, C. and Yang, T. (1996). Sparse LU Factorization with Partial Pivoting on Distributed Memory Machines. In ACM/IEEE Supercomputing’96, Pittsburgh.Google Scholar
  18. [Gerasoulis et al., 1995]
    Gerasoulis, A., Jiao, J., and Yang, T. (1995). Scheduling of Structured and Unstructured Computation . In Hsu, D., Rosenberg, A., and Sotteau, D., editors, Interconnections Networks and Mappings and Scheduling Parallel Computation , pages 139–172. American Math. Society.Google Scholar
  19. [Gerasoulis and Yang, 1993]
    Gerasoulis, A. and Yang, T. (1993). On the Granularity and Clustering of Direct Acyclic Task Graphs. IEEE Transactions on Parallel and Distributed Systems, 4(6):686–701.CrossRefGoogle Scholar
  20. [Jeannot, 1999]
    Jeannot, E. (1999). Allocation de graphes de tâches Éccoleparamétrés et génération de code. PhD thesis, Normale Supérieure de Lyon, France. ftp ://f tp . ens-Lyon . f r/pub/LIP/Rapport s/PhD /PhD1999/PhD1999–08.ps . Z.Google Scholar
  21. [Kwok and Ahmad, 1996]
    Kwok, Y.-K. and Ahmad, I. (1996). Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors. IEEE Transactions on Parallel and Distributed Systems, 7(5):506–521.CrossRefGoogle Scholar
  22. [Liou and Palis, 1998]
    Liou, J.-C. and Palis, M. A. (1998). A New Heuristic for Scheduling Parallel Programs on Multiprocessor. In IEEE Intl. Conf. on Parallel Architectures and Compilation Techniques (PACT’98), pages 358–365, Paris.Google Scholar
  23. [Loi, 1996]
    Loi, M. (1996). Construction et exécution de graphe de tâches acycliques à gros grain. PhD thesis, Ecole Normale Supérieure de Lyon, France.Google Scholar
  24. [Mongenet, 1997]
    Mongenet, C. (1997). Affine Dependence Classification for Communications Minimization. IJPP, 25(6).Google Scholar
  25. [Namyst and Méhaut, 1995]
    Namyst, R. and Méhaut, J.-F. (1995). PM2: Parallel Multithreaded Machine. A computing environment for distributed architectures. In Parallel Computing (ParCo’95), pages 279–285. Elsevier Science Publishers.Google Scholar
  26. [Palis et al., 1996]
    Palis, M., Liou, J.-C., and Wei, D. (1996). Task Clustering and Scheduling for Distributed Memory Parallel Architectures. IEEE Transactions on Parallel and Distributed Systems, 7(1):46–55.CrossRefGoogle Scholar
  27. [Papadimitriou and Yannakakis, 1990]
    Papadimitriou, C. and Yannakakis, M. (1990). Toward an Architecture Independent Analysis of Parallel Algorithms. SIAM Journal on Computing, 19(2):322–328.MathSciNetMATHCrossRefGoogle Scholar
  28. [Pugh, 1992]
    Pugh, W. (1992). The Omega Test a fast and practical integer programming algorithm for dependence analysis. Communication of the ACM. (http://www.cs.umd.edu/projects/omega).
  29. [Sarkar, 1989]
    Sarkar, V. (1989). Partitioning and Scheduling Parallel Program for Execution on Multiprocessors. MIT Press, Cambridge MA.Google Scholar
  30. [Schrijver, 1986]
    Schrijver, A. (1986). Theory of linear and integer programming. John Wiley & sons.MATHGoogle Scholar
  31. [Tang and Gao, 1998]
    Tang, X. and Gao, G. R. (1998). How “Hard” is Thread Partitioning and How “Bad” is a List Scheduling Based Partitioning Algorithm? In tenthACMSymposium on Parallel Algorithms and Architectures (SPAA98), Puerto Vallarta, Mexico.Google Scholar
  32. [Yang and Gerasoulis, 1992]
    Yang, T. and Gerasoulis, A. (1992). Pyrros: Static Task Scheduling and Code Generation for Message Passing Multiprocessor. In Supercomputing’92, pages 428–437, Washington D.C. ACM.Google Scholar
  33. [Yang and Gerasoulis, 1994]
    Yang, T. and Gerasoulis, A. (1994). DSC Scheduling Parallel Tasks on an Unbounded Number of Processors. IEEE Transactions on Parallel and Distributed Systems, 5(9):951–967.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2000

Authors and Affiliations

  • Michel Cosnard
    • 1
  • Emmanuel Jeannot
    • 2
  • Tao Yang
    • 3
  1. 1.LORIA INRIA LorraineVillers les NancyFrance
  2. 2.LaBRI, University of Bordeaux ITalence CedexFrance
  3. 3.CS Dept. UCSBEngr Building ISanta BarbaraUSA

Personalised recommendations